improvements

By anders pearson 08 Oct 2005

i’ve been working down my list of stuff that i broke when moving the site to cherrypy and i think i’ve pretty much got it all fixed. if you find something else broken, let me know.

the old engine had a static publishing approach. when you added a post or a comment, it figured out which pages were affected by the change and wrote out new static copies of those files on disk, which apache could then serve without any intense processing. combined with a somewhat byzantine architecture of server side includes, this was quite scalable. the site could handle a pounding from massive amounts of traffic without really breaking a sweat because most of the time, it was just serving up static content.

with cherrypy now, everything is served dynamically, meaning that every time someone visits the frontpage, a whole bunch of python code is run and a bunch of data is pulled out of the database, processed, run through some templates, and sent out to the browser.

this obviously doesn’t scale as well and you may have noticed that page loads were a little slower than before (although, honestly, not as slow as i was expecting them to be). so, have i lost my mind? why would i purposely make the site slower?

my main reason is that by serving pages dynamically, i could drastically simplify the code. the code for calculating which pages were affected by a given update was a huge percentage of the overall code. it made adding any new features or refactoring a daunting task. if the sheer volume of code weren’t enough, any time i made a change to the engine, all the pages on disk essentially needed to be regenerated. i had a little script for that but with thousands of posts and comments in the database, running it would actually take a few hours. so that was another obstacle in the way of making improvements to the site. the overall result was that i let things kind of stagnate for quite a while. with everything generated dynamically, the code is short and clean and any changes i make are instantly reflected with just a browser refresh.

performance with the new code was definitely not as good, but it was actually decent enough to satisfy me for a few days while i finished fixing everything. knowing that benchmarks are good, i did a couple little quick benchmarks requesting the index page (which is one of the more database intensive pages, and, along with the feeds, one of the most heavily trafficked pages) 100 times, ten concurrent requests (using ab2 -n 100 -c 10), i found that it could serve .69 requests per second when requested remotely (thus, with a typical network latency) or .9/sec when requested locally (no network latency, so a better picture of how much actual server load is being caused). not great, but also not as bad as i expected. for comparison, apache serving the old static index gave me 6.8/sec (remote) and 28/sec (local). so it was about an order of magnitude slower. not awful, but bad enough that i would need to do something about it.

tonight, once i got everything i could think of fixed, i explored memcached and appreciated its simplicity. it only took me a couple minutes and a couple lines of code to set up memcached caching of the index page, feeds, and user index pages. the result is 6.0/sec (remote) and 85/sec (local), which makes me very happy. the remote requests are clearly limited by the network connection somewhere between my home machine and thraxil.org so there’s nothing i could do to make that any faster. since memcached keeps everything in RAM, it manages to outperform apache serving a file off disk on the local requests. i’ve got a couple more pages that i want to add caching for but i’m resisting the urge to go hogwild caching everything because i know that that’ll get me back to an ugly mess of code to determine which caches need to be expired on a given update.

of course, i’m also mulling over the possibility of writing some code to cache based on a dependency graph and making that into a cherrypy filter. if i could do it right, it wouldn’t get in the way. but that’s low on my list of priorities right now.

depending on whether i feel more like painting or coding this weekend, i may crank out a few items from my ‘new features and enhancements’ list.

Tags: benchmarks optimization meta cherrypy performance caching memcached