by anders pearson
Fri 12 Jun 2009
| comments: 1
A while back, I detailed how I went about deploying TurboGears apps with workingenv and supervisord. That's still the basic approach that I'm using for the TG apps that I still maintain and keep running. Lately though, I've been doing more of my work with Django and the landscape of Python deployment tools has changed significantly enough that I thought it was about time I explained how (and why) I'm deploying things these days.
Motivations
I still have the same basic needs. I write a lot of small applications that get deployed on one server and I don't want to spend all my time doing the upgrade dance when a new version of a library comes out. If I've got a bunch of apps written with Django 0.97 that are running and I want to deploy another app on the same server that's written against Django 1.0, I don't want to have to upgrade all my old ones (since there were some non backwards compatible changes introduced). Similarly, I don't want the presence of my old legacy apps preventing me from being able to use the latest versions of libraries for my new apps. Generally, this means that the approach of installing libraries into the global site-packages just does not work for me.
I should also mention that while the approach detailed in my old post uses workingenv's "--requirements" flag to give it a list of URLs of packages to install, I actually moved away from that quite a while ago instead opting to bundle all the needed eggs in a directory right alongside the application's code so the entire deployment process could be done without relying on the network. I just had too many problems with either PyPI or the TG website being unavailable at the wrong time and breaking my deployments. It also meant that I couldn't bootstrap a new environment if I was on a laptop without network access.
What's New?
My old post explained in some depth how I got around that problem for the TurboGears apps I was deploying by using workingenv to isolate each application's libraries in a way that I could just bundle the libraries needed for each along with the application and not have to worry about what else was on the system.
Things are slightly different now with Django for a couple reasons.
First, mod_wsgi has matured and, for my purposes, seems to be the best way to deploy Django applications. That means I'm back to Apache and don't need the whole setup with lighttpd proxying back to individual application web server processes each being monitored and controlled by supervisord.
Second, the Django community as a whole seems to have not bought into setuptools and eggs as the preferred distribution method and have stuck with distutils instead. My old approach relied on having all the required libraries bundled as eggs. With TurboGears, that was rarely an issue since TG was completely tied to setuptools. But with Django, I find that more often than not, the library I want to use isn't available as an egg, so I ended up having to download the source tarball and manually build an egg for each library. That got tedious after a while. Ideally, a new approach would allow me bundle both eggs and source tarballs. Failing that, source tarballs currently seem to be the path of least resistance in a Django ecosystem.
Finally, workingenv has been all but deprecated in favor of virtualenv (and/or the combination of virtualenv and pip depending on what set of workingenv features one is replacing). Virtualenv has a number of advantages over workingenv that I won't bother enumerating here. Pip is also an important new player in town. Conveniently, pip happens to use source tarballs instead of eggs.
The Current Approach
I'm just going to dive into it.
In each project, I have a 'requirements' directory. That, in turn, has a 'src' directory in which I've dumped source tarballs for every library that the project requires. That includes Django itself, database drivers, everything.
Also in 'requirements' are 'libs.txt' and 'apps.txt' text files. Those are just lists of the contents of the 'src' directory since pip currently isn't (yet) smart enough to figure out the order that it needs to install things when you just give it a directory full of tarballs. I separate them into 'libs' and 'apps' just for convenience. 'libs' are plain python libraries and 'apps' are full-fledged django apps.
At the top level of the project, there is a 'bootstrap.py' file with the following contents:
#!/usr/bin/env python
import os
import sys
import subprocess
import shutil
pwd = os.path.dirname(__file__)
vedir = os.path.join(pwd,"ve")
if os.path.exists(vedir):
shutil.rmtree(vedir)
subprocess.call([os.path.join(pwd,"pip.py"),"install",
"-E",os.path.join(pwd,"ve"),
"--enable-site-packages",
"--requirement",os.path.join(pwd,"requirements/apps.txt")])
That just looks for a directory named 've', blows it away if it's there, then runs a command that basically works out to
$ pip.py install -E ve --enable-site-packages --requirement requirements/apps.txt
In other words, telling pip to install all the packages specified in requirements/apps.txt (which in turn has a reference to libs.txt) into a new virtualenv directory called 've'. The only real reason that's done in Python is to easily make it self-aware of it's location so the script can be run from any directory and it will figure out how to put the 've' directory in the right place. Bash makes that harder than it ought to be.
I also include a copy of pip.py in the top level project directory so pip doesn't even have to be globally installed on a system to bootstrap. (I think I can also drop virtualenv.py in there too but I already have that installed on every system I admin so I haven't bothered yet).
All together now
I actually have all of this set up in a Paster template (along with a whole bunch of other customizations) so instead of the usual
$ django-admin.py startproject foo
I run
$ paster create --template=mydjangotemplate foo
and I get a stock Django project directory plus the requirements directory full of source tarballs and the bootstrap.py. The copy of 'manage.py' that my paster template drops in there has the '#!/usr/bin/env python' line replaced with '#!ve/bin/python'. So the next couple steps for me to bring up a running dev server look like:
$ cd foo
$ chmod +x bootstrap.py manage.py # paster templates don't let you set permissions
$ ./bootstrap.py # installs all my requirements into 've'
$ createdb foo # i pretty much always use postgresql and my custom settings.py are configured for it
$ ./manage.py runserver
I'd like to stress that that happens on a system that has Paste and virtualenv installed but does not need to have Django, psycopg2, or any of the other libraries that are used installed. They are included in the template and get installed into the virtualenv for that one project.
I check everything except the 've' directory into version control (usually git these days). That includes all the source tarballs. It's a bit wasteful of disk space and bandwidth but I haven't found that it's that much overhead in practice.
When I deploy to production (via our automated system at work), the project is checked out of version control, rsync'ed to the production server then
$ ssh productionserver /path/to/myapp/bootstrap.py
is run and everything gets installed into a virtualenv on the production server. Other applications on the same server are unaffected. My mod_wsgi configs point to '/path/to/myapp/ve/lib/python2.5/site-packages' and so on (they are autogenerated by my paste template as well).
I plan on going into more detail on the paste template magic I use in the future, but that should give you a bit of a sense for why I like it so much.
A couple fine points
A few caveats:
-
I have this working on Ubuntu 9.04 but it requires a couple tweaks since Python 2.6 is the default on that system. Figuring out how to do that is left as an exercise for the reader.
-
I don't use that many libraries that have C extensions (and thus require time-consuming compilation steps). Pretty much just psycopg2 and PIL. psycopg2 is small enough that compiling it is very quick so I just put the source tarball in requirements like usual. PIL is big though and I don't want to have to wait for it to compile every time I deploy an app to production. That one's stable enough and is easily installed with Ubuntu's package manager that it's become my one exception and I just install it globally and it hasn't been a problem. (this is why pip gets the '--enable-site-packages' flag to pull that in)
-
pip doesn't yet do a very good job of figuring out dependencies. If you don't have the order just right in the requirements files, it will try to download dependencies off PyPI even if they're sitting right there. It's taken me a bit of trial and error to get it right.
Eg
If you want to see a real, working example of a project that uses this, check out the requirements directory and bootstrap.py from one of my projects.
by anders pearson
Wed 13 May 2009
| comments: 0
I don't usually do book reviews here, but Packt Publishing was nice enough to send me a review copy of Scott Newman's Django 1.0 Template Development a while back and I haven't otherwise had much in the way of ideas for posts lately, so here goes.
Over the years, I've used probably a dozen different templating languages in a couple different languages, rolled my own, and contributed bugfixes to a few others. I remember years back when it seemed like everyone was writing Perl CGI programs that generated HTML with a million print statements and people looked at you funny if you suggested that a templating language might simplify things for them. Thankfully, as of 2009, the web development community seems to be on board with the basic idea of separating code from markup and all the advantages that that approach entails.
Of all those templating languages that I've played with over the years, Django's is currently my favorite. That's not to say it doesn't have its flaws. I don't think there's such thing as a perfect templating language; they have to balance too many things and will never be everything to everyone. Django seems to have come closer to the sweet spot than anything else I've used though. It's powerful enough that you rarely need to work around it but simple enough that once you learn the basics, you rarely have to look anything up. The syntax is remarkably clear and compact, and somehow, it's still really, really fast.
Any book on Django templates faces an uphill battle to prove its worth. Django's template language is a very simple syntax and has some of the best online documentation out there. I was certainly skeptical at first that there was really any room for a book on such a narrow and already covered topic.
Unsurprisingly, then, the book's section on the basics of the syntax is pretty short and to the point. More space is then spent just providing a reference section for the built in tags and filters. I certainly can see the usefulness of that. Sometimes the searchable online docs are what you need, and sometimes it's handy to have paper. The book's explanations of these built-ins aren't substantially different from the online docs, but do tend to go into a little more detail on specifics.
The meat of the book is really in the sections covering everything in Django that's around the templates and interacts with them. This turns out to be practically everything in Django (except maybe the ORM). The overview of how URLs are mapped to views, which load data into contexts that can be modified by Context Processors, process the contexts with templates, and return responses which might be further modified by middleware is probably not going to be very exciting to a programmer with much Django experience but I could see it being useful for someone new to Django or to, eg, a designer who's only touching the templates but ought to understand how they fit into the whole system.
Template loading and inheritance are covered in pretty good depth. Those are topics that tend to get glossed over but are actually of key importance to understanding how to structure large Django projects to allow for a great deal of flexibility and power. I find Django's approach to template inheritance very intuitive (if not quite as powerful as ZPT's) so I didn't need too much help getting up to speed with it, but I could see how it might be unfamiliar territory if you've only worked with 'include' style template systems and the book's explanation of how to think about structuring your templates is probably just what's needed. The template loaders that are included with Django are covered in detail and it does a good job explaining why they each exist.
A great opportunity was missed though as there's no discussion of creating your own template loaders. I've implemented a template loader that would pull templates out of the database before and I can say from experience that documentation on writing your own template loaders is sorely lacking. It ultimately wasn't very hard to do, but I remember having to actually dive down into Django's source code to figure out how to do it (I recommend reading Django's source code though; it's remarkably clear compared to other frameworks I've had to do similar digging in).
The chapter on writing your own custom template tags and filters is probably the most important chapter in the book. Django's template syntax is purposefully simple so one of the most common complaints I see is that it ends up pushing a lot of "display" logic into the views. This is almost always easily overcome in a nicely reusable and modular fashion with a custom tag or filter. The problem is that Django's documentation on how to do that is hard to find and very intimidating. It's there and adequate as a reference, but it's very easy for someone to get pretty far with Django without knowing that custom tags and filters are an option at all, or just thinking that they're too complicated to bother with. The reality is that they're pretty straightforward to implement once you've seen a good explanation of how they work, which this book finally provides.
There are also chapters on pagination, customizing the admin look and feel, and internationalization. Pagination, I think is covered pretty well in the online Django docs and isn't very complicated so I feel like that chapter is mostly filler. Customizing the admin interface is pretty straightforward and the chapter is superfluous if you've really understood the earlier chapters on template loading and inheritance since it's just a direct application of that. But I can see it being a useful example since the admin app is so widely used and customizing it is something of a Django rite of passage. Maybe if the earlier chapters didn't sink in, that would help pull it together. Internationalization, admittedly, I've just barely ever had to deal with so I tend to skip over that.
I'd be happier if the book got more into django.forms or really even discussed them at all. Designers who are working with django templates are bound to run across forms and have to deal with them. Eg, there's discussion of customizing the django admin interface, but in reality that almost always also includes tweaking the forms and not just the templates and CSS.
I think what I'm getting at is that while the book works well for a programmer looking to get deeper into everything in and around Django templates, I think it ends up being much less useful for someone more on the design end of the spectrum than I'd hoped. In other words, if I'm working on a Django project with a designer who's doing the HTML and CSS and doesn't know much programming beyond that, I had been hoping that I could hand them the Django Template Development book and it would serve their needs. Maybe that isn't what the author intended, but it's what I'd hoped for when I saw the title. But without a discussion of forms or aspects of the model that someone working on a template would probably encounter (eg, if they saw "{% if author.page_set.count %}" in a template, they ought to know something about how that relates to foreign keys, the _set convention, and querysets, at least to recognize the common pattern in use), I'm not sure it's going to provide them with anything more than a printout of Django's online template syntax and built-ins docs would.
by anders pearson
Thu 18 Dec 2008
| comments: 0
Another year, another book of my drawings and paintings.
Here it is: Myopica 2008
As usual, it's public domain, sold at cost (but lulu.com did raise their prices this year) with a free PDF download available if you don't want paper. I also have a gallery set up if you find that more convenient to browse than the PDF.
by anders pearson
Mon 15 Sep 2008
| comments: 2
I came across a link to the slides for Xavier Leroy's course on Functional programming languages this weekend and have been slowly making my way through them.
It's just slides from the lectures so it can be a bit of a struggle to follow them without, you know, the actual lectures. On the other hand, having to struggle a bit and figure out what's going on from just the slides has been kind of good for forcing me to really pay attention and not gloss over the parts that aren't immediately obvious.
Judging by the slides, the lectures are fantastic. They're incredibly ambitious in scope. In the first lecture, it starts by introducing the Lambda Calculus (well, that part is fast enough that I think it's assumed that the reader is already basically familiar), shows reduction strategies, implements a lambda calculus interpreter in a couple lines of Caml, makes it efficient, adds lexical scoping and closures, and keeps it all tied back to the Lambda Calculus. Further lectures go on to describe abstract machines, compile the enriched Lambda Calculus for the abstract machine, do tail call elimination and lazy evaluation, proves the correctness of the abstract machine and compiler, gets into exception returning vs state passing vs continuation passing implementations, monads and monad transformations (including the concurrency monad transformer), and then gets into some even deeper optimizations. It basically connects the theoretical, mathematical underpinnings of programming all the way to the tricks that are used in compiler optimization. Highly recommended if that sounds interesting to you.
Anyway, in the course of making my way through that material, I decided that I needed to refresh my understanding of the Lambda Calculus and combinators since I haven't really done much of that since undergrad. I've been writing a lot of JavaScript lately for work and, while I still don't really like the syntax that much, JavaScript does have proper closures so it's suitable for lambda calculus experimentation.
So I decided to implement the Y Combinator in JavaScript as an exercise. I remember it being covered in class long ago, and, while I could see why it was interesting, I don't think it ever quite clicked for me exactly how it worked. There's a good page explaining the how and why of the Y Combinator here. The example code is all in Scheme, so I just went through the page and translated each example to JavaScript, making sure I understood what was going on at each step. I kind of like this approach to learning CS. Just reading a text with code samples, it's all too easy to accept that the samples do what it says they do and not really spend the extra effort to understand exactly what's going on. Translating it into a different programming languages forces you to really grasp every detail.
So, this is my JavaScript Y Combinator. There are many like it, but this one is mine:
function Y (X) {
return (function(procedure) {
return X(function(arg) {
return procedure(procedure)(arg);
});
})(function(procedure) {
return X(function(arg) {
return procedure(procedure)(arg);
});
});
}
It's not as pretty as the scheme version:
(define Y
(lambda (X)
((lambda (procedure)
(X (lambda (arg) ((procedure procedure) arg))))
(lambda (procedure)
(X (lambda (arg) ((procedure procedure) arg)))))))
And neither is as pretty as the original untyped Lambda Calculus:
Y = λf·(λx·f (x x)) (λx·f (x x))
But it works.
by anders pearson
Sat 13 Sep 2008
| comments: 1
Having built a few Django sites now, I'm ninety-nine percent happy with it as a framework. The remaining one percent is a couple issues I have with how it does things, but they're relatively minor. Overall, I'm impressed at how well Django gets all the details right.
These are the things that are still on my wishlist for Django. No doubt some of them are things that Django already does and I just haven't come across them in the documentation yet (and hopefully someone will add a comment to point me in the right direction). Other things are probably impossible given other design constraints. The rest are probably things that just bother me and no one else cares.
In no particular order:
Transactions per HTTP Request by default
Django's transaction's work on a an autocommit model by default so each time a model object is saved, it happens in a seperate transaction. For web applications, I've always found that it makes far more sense (and eliminates some painfully sneaky bugs) if there's a single database transaction per HTTP request. It's fairly trivial to enable that model in django by including the transaction middleware. I just don't see why it isn't included by default.
Better Support for ON DELETE CASCADE with PostgreSQL
Creating a foreign key relationship with Django's ORM effectively does an ON DELETE CASCADE, but it doesn't actually set up the database that way; the deletion is handled in the Python layer instead. If you only ever use the Django ORM to access the database, everything works smoothly and you'd never even notice. However, by not doing the cascade in the database, it makes it more difficult to maintain data integrity if you ever want to modify the database directly with some other tool or language. My understanding is that the reason for this has to do with Django's supporting certain RDBMSs that don't implement cascades. But, dammit, I use PostgreSQL for all my sites and it does support it just fine. SQLObject works with the same broken RDBMSs yet it manages to set up a proper ON DELETE CASCADE when you use it with PostgreSQL.
Not Swallowing Template Errors
Django's templates are a thing of beauty. They strike a really nice balance of simplicity and power. But if your template calls a method that raises an exception, it swallows the exception and just returns an empty string. You don't see anything in the browser and there are no tracebacks in the console. On a production site, this is absolutely what you want to happen. But when I'm developing, this behavior masks errors that I would like to see, so I can fix them. From what I've read, there are backwards compatibility issues with having errors make it up to the browser, even in development mode. But it seems to me that it should be possible to get a traceback in the console when running 'manage.py runserver'.
Easier HTTP Method Dispatch
I'm a huge fan of the REST architectural style, so it bugs me a bit to have code like this in all my views:
if request.method == "POST":
# do some POST stuff
if request.method == "GET":
# do some GET stuff
if request.method == "DELETE":
# do some DELETE stuff
# ...
I've seen a few different projects out there to make REST method dispatch smoother, but I'd like to see it in the core. More hooks for supporting E-Tags and conditional requests would also make me happy.
Also see my note on Cherrypy style dispatching below.
Flexible Directory Templating
Django's 'startproject' and 'startapp' commands are basically doing the same thing as Paste's 'create'. Paste has this wonderful ability to use templates though. TurboGears and Pylons both just use Paste to do their equivalent code generation tasks. So, when I was doing TurboGears at work, I was able to create, by subclassing the TG template, one master application template that included the settings that we pretty much always used, the scripts and configs that we use for our automated deployment system, etc. Then, to start a new TG app, I could just do:
$ tg-admin quickstart --template=ourcustomtemplate
Super convenient.
I've actually gone as far as creating my own paste template for starting django projects. So instead of doing 'django-admin.py startproject', I do 'paster create --template=mycustomdjangotemplate' and I get a django project with my custom setup and config. The problem though is that if Django changes the code that startproject generates, I'll have to manually update my Paste template to match it. If Django just used Paste for that functionality, I could subclass the default Django template and pick up those changes for free.
Eggs / setuptools support
The Django community for the most part doesn't seem too interested in supporting setuptools distribution or distributing eggs for their apps. Setuptools is notoriously difficult to get your head around, but once you do, you can use it to rig up some really slick distribution and deployment mechanisms. I have a script that will, in one step, set up a virtualenv, and install into it all the eggs that I've placed in a directory. Our automated deployment uses that to reliably and repeatably install an application and all its dependencies (the precise versions that I've selected and dropped in the eggs directory) onto our production or staging servers. I never have have the problem of something that works in development breaking in production because prod had a slightly different (and incompatible) version of some library installed. You can rig this sort of thing up with distutils packages too, but eggs make it much easier. Like with Paste templates, I already use this approach with Django, but it involves the extra step for me of generating eggs for Django and the third-party Django apps I use. The plugin mechanism that setuptools provides is also a thing of beauty and I can imagine quite a few ways that it could simplify the building and deploying of reusable django apps.
Image Thumbnailing Support in Core
sorl.thumbnail rocks. It's always the first Django app I add to a project (every project I work on seems to involve image uploading at some point). It is my opinion that sorl.thumbnail, or something that provides equivalent functionality should make it into the core since it's such a common need.
Cherrypy Style URL Dispatch
Django's regex dispatch in urls.py is very powerful and flexible. It's basically the same as Routes, but the plain regex syntax seems much more sensible to someone like me (with years of Perl experience) than Routes' custom rules syntax. Still, it's overkill for a lot of applications. I'm still a fan of Cherrypy's approach of mapping the URLs to a tree of controller objects. So '/foo/bar/baz/?p1=v1' turns into a call of the 'SomeRootObject.foo.bar.baz()' method with p1=v1 arguments. It may look really restrictive, but in my experience with TurboGears, you almost never need to do anything more complicated than that (and Cherrypy does have mechanisms to let you do more complicated things for those few times when you need to). On my todo list is to write a cherrypy style dispatcher for Django (not to replace the default urls.py approach, but to augment it) or just figure out how to shoehorn cherrypy's dispatcher directly into a django project. I know that probably no one else will care, but it will make me happy.
A Pony
Oh, wait, Django already has one. Nevermind.
by anders pearson
Fri 05 Sep 2008
| comments: 2
One of my other little Django side-projects is ready to show now too: myopica.org
I've been posting my artwork on flickr but I also use that for all my other random photography and whatnot. All the sharing functionality and the community of flickr are great, but it's not really a very good portfolio and you have to kind of hunt around to find my paintings and drawings.
So I set up myopica.org as a proper portfolio site for my artwork. It's extremely basic and stripped down so nothing really distracts from the artwork.
by anders pearson
Fri 05 Sep 2008
| comments: 0
Django 1.0 being released reminded me that perhaps I should mention here that thraxil.org is now running on Django.
Whenever I move most of my development to a new technology, rewriting the backend for this site in it is usually the final vetting stage for me. Thraxil.org isn't the biggest, most complicated piece of software on the net, but it's different enough from most blogs and I'm particular enough about certain things that I feel like if I can build this site with a language or framework and not run into any serious issues, it's good enough for me to do my day to day development with it. So over the years it's gone through a few different Perl incarnations with different databases behind it, switched to Python CGIs that published static pages, then CherryPy + SQLObject + SimpleTAL, and finally TurboGears. I started on a Plone version at one point when I was into that, but never quite got the momentum to push that one far enough to actually use in production.
I've been keeping a close eye on Django since it came out a couple years back. At that time, Django wasn't very polished, did a lot of things I didn't like (many of those things were fixed with the "magic removal" branch), and I was already using Cherrypy and SQLObject pretty comfortably so it didn't appeal to me very much. TurboGears was announced soon after and, being based on CherryPy and SQLObject, it was a no-brainer for me to move to it. But Django's kept chugging along the whole time, adding features, refactoring code, polishing things up, picking up a larger and larger community, and generally eliminating my objections one by one. Earlier this year, I took some time to play with it again and I was impressed at how far along it had come and started building small test apps that got larger and more non-trivial over time. I built a few Django sites for myself, built and launched The FU for a friend and basically decided it was ready to become my preferred framework (well, I'll probably still use Bourbon for writing microapps). So it was time to re-write thraxil.org.
I'd also been growing increasingly frustrated with the unreliable OpenVZ based virtual server that the site has been hosted on for the last couple years so the rewrite also included a move to a nice SliceHost Xen server.
Of course, no rewrite is ever just a rewrite. There's always design changes to make and new features that manage to creep their way into the scope of the rewrite. In this case, it was mostly things I wanted to simplify. I've mostly been using delicious and flickr for getting my bookmarks and images up on the web, so those sections of the site have been superfluous and needed to go (technically they're still around, just made much less prominent in the display and site architecture). I've also become more aggressive in my hatred of comment spam (and the stupid, random, drive-by comments from strangers) so comments here are now moderated. Ie, when a comment is posted, I get an email and I have to either approve the comment, or it doesn't show up on the site.
The switch to Django actually went even smoother than I was expecting, even after building a few Django sites beforehand. It felt a little like cheating. Since Django's ORM and SQLObject have such similar requirements on the database schema (integer primary keys on every table, etc.) I could actually just keep the database as-is and not have to worry about redesigning the schema or migrating data. Django's "inspectdb" functionality gave me a basic models.py in seconds. It took five minutes more to tweak models.py a bit and enable the Django admin application. So in literally minutes, I had a Django site running with all my data and a completely usable admin interface. From there, it was mostly just going through the list of different page types on the site, creating a view for each, mapping it to the right url, and making a template for it. Altogether, after two evenings staying a bit late in my office, I had pretty much the whole public side of the site ported over and ready for production. That includes all my experimentation with the visual design of the site (probably more of my time was spent fighting with CSS than with any of the backend code). There are still a few odds and ends here and there that I still need to put back in (search, per-tag atom feeds, etc) but it's mostly done.
I also have to admit that I'm impressed with the performance. The old site was TurboGears + SimpleTAL + PostgreSQL running proxied behind lighttpd on an OpenVZ server. I made heavy use of memcached so the vast majority of requests were just getting served out of cache and TG barely ever had to hit the database. That was good enough for it to survive a post hitting #1 on reddit without the server really breaking a sweat. The new setup is Django + PostgreSQL being served by Apache with mod_wsgi (in daemon mode) on a Xen server. I decided to benchmark before I bothered setting up memcached and found that the new setup, without caching, is already faster than the old one was with caching enabled. It's not a totally fair benchmark since I don't really know anything about the underlying hardware that each virtual server was running on and the new, simpler design is probably doing a little less work than the old one was. Still, unless I start to see performance issues, I'm just not even going to bother setting up caching.
I haven't even begun to make use of the myriad of pluggable Django applications that are out there ready to be integrated into the site. I have a few ideas, but for now I'm digging the new, stripped down version of things.
Overall, though, I'm quite pleased with Django so far. I imagine I'll have more to say about it in the near future.
by anders pearson
Thu 13 Mar 2008
| comments: 8
Every once in a while, I run across a bug or a tricky problem where googling for a solution doesn't turn up much. When I come up with a solution, I like to write it up and put it online so the next person to come across it hopefully will have an easier time figuring it out. This is one of those posts.
One of the internal applications I wrote at work does a lot of work
via external programs. It's basically glueing together a whole bunch
of shell scripts and putting a nice interface on them.
Running an external program from Python isn't very hard in the simple
case. There's actually a wealth of options available. The entry level
is to use os.system() and give it a list of arguments. That gives you
the return code but doesn't give you the output of the command.
For what I'm doing, I need to have access to the return code, STDOUT,
and STDERR. Requirements like that lead to the os.popen*
functions. Basically, something like:
import os
(c_stdin,c_stdout,c_stderr) = os.popen3(cmd,'r')
out = c_stdout.read()
err = c_stderr.read()
c_stdin.close()
c_stdout.close()
c_stderr.close()
There are still problems with that. The environment that the child
command runs in (variables, cwd, tty, etc) is the same environment
that the parent is running in. So to set, eg, to set environment
variables for the child, you have to put them into os.environ in the
parent, or to set the cwd for the child command, you have to have the
parent do an os.chdir(). That can be troublesome in some
situations. Eg, if the parent is a CherryPy process, doing an
os.chdir() makes it hopelessly lost and it will crash. So you have to
fork() a child process, set up the environment there, do the above
code, and then pass the results back to the parent process.
This has been enough of a pain point for Python programmers that
Python 2.4 added the subprocess module. The code above can be replaced
with:
from subprocess import Popen, PIPE
p = Popen(cmd,stdout=PIPE,stderr=PIPE)
(out,err) = p.communicate()
Aside from being a little shorter, subprocess.Popen() also takes
additional arguments like cwd and env that let you manipulate the
environment of the child process (it does the fork() for you). It
basically gives you one very nice interface for doing anything and
everything related to spawning external commands. Life is generally
better with subprocess around.
Unfortunately, there is a subtle, devious bug in that code. I know
this after encountering it and spending many hours trying to figure
out what was going on.
Where I encountered it was when the command being run was doing an svn
checkout. The checkout would run for a while and then the svn command
would hang at some point. It wouldn't use CPU, there would be no error
messages. The process would still show up in ps or top. It would just
stop and the parent process would sit and wait for it to
finish. Complete deadlock. Running the exact svn command on the
commandline, it would run with no problems. Doing an svn checkout of a
different repository would work fine. Kill the hung svn process and
the parent would complete and STDOUT would show most of the expected
output from the svn checkout. With the particular repository, it would
always hang at exactly the same spot; completely repeatable.
How could an svn checkout of a particular repository hang, but only
when run via subprocess?
After much frustrating debugging, searching, and experimentation, I
narrowed it down to the output of the svn command on STDOUT. If I
added a -q (quiet) flag, it would complete without hanging. I
eventually noticed that the output that had been collected in STDOUT
after killing the svn process was right around 65,000
characters. Since 216 is 65536, that seemed like a coincidence worth
investigating. I wrote a test script that just wrote 216 characters
to STDOUT and ran it via subprocess. It hung. I modified it to print
216 - 1 characters to STDOUT. No hanging. The troublesome svn
repository happened to have a lot of files in it, so a lot of verbose
output on the checkout.
A closer inspection of the subprocess.Popen docs revealed a warning "Note:
The data read is buffered in memory, so do not use this method if the
data size is large or unlimited." I'd probably read that before and
assumed that it was a warning about possibly consuming a lot of memory
and being inefficient if you try to pass a lot of data around. So I
ignored it. The STDOUT chatter of shell scripts that I was collecting
for logging purposes did not strike me as "large" (65K is positively
tiny these days) and it isn't an application where I'm particularly
concerned about memory consumption or efficiency.
Apparently, that warning actually means "Note: if there's any chance
that the data read will be more than a couple pages, this will
deadlock your code." What was happening was that the memory buffer was
a fixed size. When it filled up, it simply stopped letting the child
process write to it. The child would then sit and patiently wait to be
able to write the rest of its output.
Luckily the solution is fairly simple. Instead of setting stdout and
stderr to PIPE, they need to be given proper file (or unix pipe)
objects that will accept a reasonable amount of data. (A further hint
for anyone who found this page because they encountered the same
problem and are looking for a fix: Popen() needs real file objects
with fileno() support so StringIO-type fake file objects won't work;
[tempfile.TemporaryFile] is your friend).
This strikes me as kind of a poor design. Subprocess is wonderful in
most ways and a real improvement over the old alternatives. But with
this kind of issue, the programmer using it will probably not
encounter any problems in development and think everything is fine but
some day will find their production app mysteriously deadlocked and
have almost no clues as to what's causing it. That seems like
something that deserves a big flashing red warning in the docs
every time PIPE is mentioned.
by jere
Fri 04 Jan 2008
| comments: 1
It was fantastic, as always! Waes hail!
by anders pearson
Mon 03 Dec 2007
| comments: 2
It's been a busy month. The NaGraNoWriMo was a success. I ruined my health, alienated my family, and none of my friends remember what I look like, but I made it all the way to 50 pages after all.
The result, "Error And Annihilation", is up on Flickr here. It's also available to order in dead tree format.
One book just wouldn't be enough for me this year though. So I also now have available Myopica, which is the sequel to last year's Nearsighted and Obsessive-Compulsive and contains pretty much everything I've done in 2007 (except what's in Error And Annihilation). There's also a corresponding (though not sorted) flickr set for convenient browsing.
As usual, those are both available as free PDF downloads and the books are for sale at cost (ie, I make no money on any of this). Happily, Lulu.com also now has an option for Public Domain licensing so that's what they are.
Again, thanks to Marc Raymond for laying them out and making everything look more professional than it is.