Django Deployment with virtualenv and pip

A while back, I detailed how I went about deploying TurboGears apps with workingenv and supervisord. That’s still the basic approach that I’m using for the TG apps that I still maintain and keep running. Lately though, I’ve been doing more of my work with Django and the landscape of Python deployment tools has changed significantly enough that I thought it was about time I explained how (and why) I’m deploying things these days.

Motivations

I still have the same basic needs. I write a lot of small applications that get deployed on one server and I don’t want to spend all my time doing the upgrade dance when a new version of a library comes out. If I’ve got a bunch of apps written with Django 0.97 that are running and I want to deploy another app on the same server that’s written against Django 1.0, I don’t want to have to upgrade all my old ones (since there were some non backwards compatible changes introduced). Similarly, I don’t want the presence of my old legacy apps preventing me from being able to use the latest versions of libraries for my new apps. Generally, this means that the approach of installing libraries into the global site-packages just does not work for me.

I should also mention that while the approach detailed in my old post uses workingenv’s “–requirements” flag to give it a list of URLs of packages to install, I actually moved away from that quite a while ago instead opting to bundle all the needed eggs in a directory right alongside the application’s code so the entire deployment process could be done without relying on the network. I just had too many problems with either PyPI or the TG website being unavailable at the wrong time and breaking my deployments. It also meant that I couldn’t bootstrap a new environment if I was on a laptop without network access.

What’s New?

My old post explained in some depth how I got around that problem for the TurboGears apps I was deploying by using workingenv to isolate each application’s libraries in a way that I could just bundle the libraries needed for each along with the application and not have to worry about what else was on the system.

Things are slightly different now with Django for a couple reasons.

First, mod_wsgi has matured and, for my purposes, seems to be the best way to deploy Django applications. That means I’m back to Apache and don’t need the whole setup with lighttpd proxying back to individual application web server processes each being monitored and controlled by supervisord.

Second, the Django community as a whole seems to have not bought into setuptools and eggs as the preferred distribution method and have stuck with distutils instead. My old approach relied on having all the required libraries bundled as eggs. With TurboGears, that was rarely an issue since TG was completely tied to setuptools. But with Django, I find that more often than not, the library I want to use isn’t available as an egg, so I ended up having to download the source tarball and manually build an egg for each library. That got tedious after a while. Ideally, a new approach would allow me bundle both eggs and source tarballs. Failing that, source tarballs currently seem to be the path of least resistance in a Django ecosystem.

Finally, workingenv has been all but deprecated in favor of virtualenv (and/or the combination of virtualenv and pip depending on what set of workingenv features one is replacing). Virtualenv has a number of advantages over workingenv that I won’t bother enumerating here. Pip is also an important new player in town. Conveniently, pip happens to use source tarballs instead of eggs.

The Current Approach

I’m just going to dive into it.

In each project, I have a ‘requirements’ directory. That, in turn, has a ‘src’ directory in which I’ve dumped source tarballs for every library that the project requires. That includes Django itself, database drivers, everything.

Also in ‘requirements’ are ‘libs.txt’ and ‘apps.txt’ text files. Those are just lists of the contents of the ‘src’ directory since pip currently isn’t (yet) smart enough to figure out the order that it needs to install things when you just give it a directory full of tarballs. I separate them into ‘libs’ and ‘apps’ just for convenience. ‘libs’ are plain python libraries and ‘apps’ are full-fledged django apps.

At the top level of the project, there is a ‘bootstrap.py’ file with the following contents:

:::python
#!/usr/bin/env python
import os
import sys
import subprocess
import shutil

pwd = os.path.dirname(__file__)
vedir = os.path.join(pwd,"ve")

if os.path.exists(vedir):
    shutil.rmtree(vedir)

subprocess.call([os.path.join(pwd,"pip.py"),"install",
                 "-E",os.path.join(pwd,"ve"),
                 "--enable-site-packages",
                 "--requirement",os.path.join(pwd,"requirements/apps.txt")])

That just looks for a directory named ‘ve’, blows it away if it’s there, then runs a command that basically works out to

$ pip.py install -E ve --enable-site-packages --requirement requirements/apps.txt

In other words, telling pip to install all the packages specified in requirements/apps.txt (which in turn has a reference to libs.txt) into a new virtualenv directory called ‘ve’. The only real reason that’s done in Python is to easily make it self-aware of it’s location so the script can be run from any directory and it will figure out how to put the ‘ve’ directory in the right place. Bash makes that harder than it ought to be.

I also include a copy of pip.py in the top level project directory so pip doesn’t even have to be globally installed on a system to bootstrap. (I think I can also drop virtualenv.py in there too but I already have that installed on every system I admin so I haven’t bothered yet).

All together now

I actually have all of this set up in a Paster template (along with a whole bunch of other customizations) so instead of the usual

$ django-admin.py startproject foo

I run

$ paster create --template=mydjangotemplate foo

and I get a stock Django project directory plus the requirements directory full of source tarballs and the bootstrap.py. The copy of ‘manage.py’ that my paster template drops in there has the ‘#!/usr/bin/env python’ line replaced with ‘#!ve/bin/python’. So the next couple steps for me to bring up a running dev server look like:

$ cd foo
$ chmod +x bootstrap.py manage.py # paster templates don't let you set permissions
$ ./bootstrap.py                  # installs all my requirements into 've'
$ createdb foo                    # i pretty much always use postgresql and my custom settings.py are configured for it
$ ./manage.py runserver

I’d like to stress that that happens on a system that has Paste and virtualenv installed but does not need to have Django, psycopg2, or any of the other libraries that are used installed. They are included in the template and get installed into the virtualenv for that one project.

I check everything except the ‘ve’ directory into version control (usually git these days). That includes all the source tarballs. It’s a bit wasteful of disk space and bandwidth but I haven’t found that it’s that much overhead in practice.

When I deploy to production (via our automated system at work), the project is checked out of version control, rsync’ed to the production server then

$ ssh productionserver /path/to/myapp/bootstrap.py

is run and everything gets installed into a virtualenv on the production server. Other applications on the same server are unaffected. My mod_wsgi configs point to ‘/path/to/myapp/ve/lib/python2.5/site-packages’ and so on (they are autogenerated by my paste template as well).

I plan on going into more detail on the paste template magic I use in the future, but that should give you a bit of a sense for why I like it so much.

A couple fine points

A few caveats:

I have this working on Ubuntu 9.04 but it requires a couple tweaks since Python 2.6 is the default on that system. Figuring out how to do that is left as an exercise for the reader.
I don’t use that many libraries that have C extensions (and thus require time-consuming compilation steps). Pretty much just psycopg2 and PIL. psycopg2 is small enough that compiling it is very quick so I just put the source tarball in requirements like usual. PIL is big though and I don’t want to have to wait for it to compile every time I deploy an app to production. That one’s stable enough and is easily installed with Ubuntu’s package manager that it’s become my one exception and I just install it globally and it hasn’t been a problem. (this is why pip gets the ‘–enable-site-packages’ flag to pull that in)
pip doesn’t yet do a very good job of figuring out dependencies. If you don’t have the order just right in the requirements files, it will try to download dependencies off PyPI even if they’re sitting right there. It’s taken me a bit of trial and error to get it right.

Eg

If you want to see a real, working example of a project that uses this, check out the requirements directory and bootstrap.py from one of my projects.