anders pearson (bio)

Subprocess Hanging: PIPE is your enemy

Every once in a while, I run across a bug or a tricky problem where googling for a solution doesn't turn up much. When I come up with a solution, I like to write it up and put it online so the next person to come across it hopefully will have an easier time figuring it out. This is one of those posts.

One of the internal applications I wrote at work does a lot of work via external programs. It's basically glueing together a whole bunch of shell scripts and putting a nice interface on them.

Running an external program from Python isn't very hard in the simple case. There's actually a wealth of options available. The entry level is to use os.system() and give it a list of arguments. That gives you the return code but doesn't give you the output of the command.

For what I'm doing, I need to have access to the return code, STDOUT, and STDERR. Requirements like that lead to the os.popen* functions. Basically, something like:

    import os
    (c_stdin,c_stdout,c_stderr) = os.popen3(cmd,'r')
    out = c_stdout.read()
    err = c_stderr.read()
    c_stdin.close()
    c_stdout.close()
    c_stderr.close()

There are still problems with that. The environment that the child command runs in (variables, cwd, tty, etc) is the same environment that the parent is running in. So to set, eg, to set environment variables for the child, you have to put them into os.environ in the parent, or to set the cwd for the child command, you have to have the parent do an os.chdir(). That can be troublesome in some situations. Eg, if the parent is a CherryPy process, doing an os.chdir() makes it hopelessly lost and it will crash. So you have to fork() a child process, set up the environment there, do the above code, and then pass the results back to the parent process.

This has been enough of a pain point for Python programmers that Python 2.4 added the subprocess module. The code above can be replaced with:

   from subprocess import Popen, PIPE
   p = Popen(cmd,stdout=PIPE,stderr=PIPE)
   (out,err) = p.communicate()

Aside from being a little shorter, subprocess.Popen() also takes additional arguments like cwd and env that let you manipulate the environment of the child process (it does the fork() for you). It basically gives you one very nice interface for doing anything and everything related to spawning external commands. Life is generally better with subprocess around.

Unfortunately, there is a subtle, devious bug in that code. I know this after encountering it and spending many hours trying to figure out what was going on.

Where I encountered it was when the command being run was doing an svn checkout. The checkout would run for a while and then the svn command would hang at some point. It wouldn't use CPU, there would be no error messages. The process would still show up in ps or top. It would just stop and the parent process would sit and wait for it to finish. Complete deadlock. Running the exact svn command on the commandline, it would run with no problems. Doing an svn checkout of a different repository would work fine. Kill the hung svn process and the parent would complete and STDOUT would show most of the expected output from the svn checkout. With the particular repository, it would always hang at exactly the same spot; completely repeatable.

How could an svn checkout of a particular repository hang, but only when run via subprocess?

After much frustrating debugging, searching, and experimentation, I narrowed it down to the output of the svn command on STDOUT. If I added a -q (quiet) flag, it would complete without hanging. I eventually noticed that the output that had been collected in STDOUT after killing the svn process was right around 65,000 characters. Since 216 is 65536, that seemed like a coincidence worth investigating. I wrote a test script that just wrote 216 characters to STDOUT and ran it via subprocess. It hung. I modified it to print 216 - 1 characters to STDOUT. No hanging. The troublesome svn repository happened to have a lot of files in it, so a lot of verbose output on the checkout.

A closer inspection of the subprocess.Popen docs revealed a warning "Note: The data read is buffered in memory, so do not use this method if the data size is large or unlimited." I'd probably read that before and assumed that it was a warning about possibly consuming a lot of memory and being inefficient if you try to pass a lot of data around. So I ignored it. The STDOUT chatter of shell scripts that I was collecting for logging purposes did not strike me as "large" (65K is positively tiny these days) and it isn't an application where I'm particularly concerned about memory consumption or efficiency.

Apparently, that warning actually means "Note: if there's any chance that the data read will be more than a couple pages, this will deadlock your code." What was happening was that the memory buffer was a fixed size. When it filled up, it simply stopped letting the child process write to it. The child would then sit and patiently wait to be able to write the rest of its output.

Luckily the solution is fairly simple. Instead of setting stdout and stderr to PIPE, they need to be given proper file (or unix pipe) objects that will accept a reasonable amount of data. (A further hint for anyone who found this page because they encountered the same problem and are looking for a fix: Popen() needs real file objects with fileno() support so StringIO-type fake file objects won't work; [tempfile.TemporaryFile] is your friend).

This strikes me as kind of a poor design. Subprocess is wonderful in most ways and a real improvement over the old alternatives. But with this kind of issue, the programmer using it will probably not encounter any problems in development and think everything is fine but some day will find their production app mysteriously deadlocked and have almost no clues as to what's causing it. That seems like something that deserves a big flashing red warning in the docs every time PIPE is mentioned.

2008-03-13 20:35:31

comments: 3

books

It's been a busy month. The NaGraNoWriMo was a success. I ruined my health, alienated my family, and none of my friends remember what I look like, but I made it all the way to 50 pages after all.

The result, "Error And Annihilation", is up on Flickr here. It's also available to order in dead tree format.

One book just wouldn't be enough for me this year though. So I also now have available Myopica, which is the sequel to last year's Nearsighted and Obsessive-Compulsive and contains pretty much everything I've done in 2007 (except what's in Error And Annihilation). There's also a corresponding (though not sorted) flickr set for convenient browsing.

As usual, those are both available as free PDF downloads and the books are for sale at cost (ie, I make no money on any of this). Happily, Lulu.com also now has an option for Public Domain licensing so that's what they are.

Again, thanks to Marc Raymond for laying them out and making everything look more professional than it is.

2007-12-03 21:02:57

comments: 2

Error And Annihilation

My NaGraNoWriMo shall now commence.

I've been convinced to scale back my ambitions to just 30 pages, or about three to four hours per day instead of my original goal of 50 pages.

I've got a fresh large Moleskine sketchbook with 30 pages pre-numbered, a pack of nonrepro blue leads for my mechanical pencil, a small collection of Sakura Micron pens, and I'm ready to go.

My working title for the book is "Error And Annihilation". I'll try to post an update now and then, but no guarantees.

2007-11-01 00:33:12

comments: 2

A Simple Programming Puzzle Seen Through Three Different Lenses

The other day, I stumbled across Mark Nelson's blog post describing a fairly simple NPR word puzzle: "Take the names of two U.S. States, mix them all together, then rearrange the letters to form the names of two other U.S. States. What states are these?"

Mark Nelson is a programmer, so his first instinct of course was to write a small program to solve it. Mine too. I immediately stopped reading his post, opened a new emacs buffer and spent five minutes coming up with this:

#!/usr/bin/env python
states = ["alabama","alaska","arizona","arkansas","california","colorado",
          "connecticut","delaware","florida","georgia","hawaii","idaho",
          "illinois","indiana","iowa","kansas","kentucky","louisiana",
          "maine","maryland","massachusetts","michigan","minnesota",
          "mississippi","missouri","montana","nebraska","nevada",
          "newhampshire","newjersey","newmexico","newyork","northcarolina",
          "northdakota","ohio","oklahoma","oregon","pennsylvania","rhodeisland",
          "southcarolina","southdakota","tennessee","texas","utah","vermont",
          "virginia","washington","westvirginia","wisconsin","wyoming",]
seen = dict()
for state1 in states:
    for state2 in states:
        if state1 == state2:
            continue
        letters = list(state1 + state2)
        letters.sort()
        key = "".join(letters)
        if seen.has_key(key):
            (old1,old2) = seen[key]
            if old1 == state2 and old2 == state1:
                continue
            else:
                print "found it: %s + %s, %s + %s" % (state1,state2,old1,old2)
                exit(0)
        else:
            seen[key] = (state1,state2)

Sure enough, it gave me the answer (and like Mark Nelson, seeing it made me realize that it probably would have been even more productive to just think about it for a minute instead of writing the code).

To me, that Python code is about as simple and direct as it gets. There's nothing clever going on there. Just loop over the cartesian product of states x states, skip a couple obvious cases and then normalize the names of the states by sorting the letters and put them into a dictionary. As soon as you hit a combination that's already in the dictionary, you have your answer.

That approach to me would qualify as a "naive" or "brute force" approach. I figured there might be a more optimized approach, but with only 50 states it wasn't really worth optimizing any more. Running it took all of 18 milliseconds.

When I went back and read the rest of the blog post, my jaw dropped. His first pass, in C++ involved a quadruply nested loop, STL libraries, and was taking upwards of an hour to run. Looking at his code, he really took the notion of a "brute force" approach to a level I hadn't even considered. His second and third passes improved the runtime and even simplified the code, but they still run on the order of a few seconds. That's for a compiled, low level language. My Python code written without any particular thoughts toward efficiency running in an interpreter (so the time of starting up the python process, parsing the code, and interpreting it directly, line by line without compiler optimizations are all included in the running time) was beating it by a couple orders of magnitude.

The key, of course, is the algorithm. Python, like Ruby, Perl, and other relatively high level languages, has supports a dictionary (or hashtable if you prefer) data type at the syntax level. As a result, anyone programming in one of those languages quickly learns how to make the most of them and becomes familiar with a number of idioms, including the one I used of testing for uniqueness by keeping a 'seen' dictionary, inserting keys one by one and looking for a collision. It's dead simple, commonly used in scripting languages, and extremely efficient since inserting into a hashtable is O(1) and tends to be one of the most finely tuned parts of a scripting language's interpreter/VM.

There's nothing Python specific about the algorithm. In fact, in the comments on the post, Vince Huston submits a C++ solution that's basically identical to my approach (and probably outperforms everyone else handily). If I were forced to solve the same problem in C++, I would probably have come up with something similar. I would not be at all surprised if Vince Huston has had some experience coding in scripting languages and Mark Nelson hasn't.

The point isn't that Mark Nelson is a bad programmer. Far from it. Looking around at the rest of his site, at articles like his explanation of the The Byzantine Generals Problem (which was how I ended up on his site in the first place), and at the books he's written, I'd guess that overall, he has more breadth and depth to his understanding of algorithms and programming than I do.

My point is really just to repeat the tired, broken record cry of advocates of higher level languages that 1) using a faster, more appropriate algorithm will usually go much further in optimization than low level optimizations (using a compiled language, static types, fretting over clock cycles in an inner loop, etc) and 2) learning different programming paradigms, languages, and idioms will improve your programming even if you end up going back to a lower level language. In this case, some familiarity with dictionary idioms common in scripting languages helps immensely in producing a fast, simple solution.

Another comment on his post goes even further. Tom Moertel submits a solution implemented in Haskell. From a performance standpoint, it's pretty much the same approach as mine, using a dictionary (in this case, Haskell's Data.Map library) to do the comparisons on normalized strings. What makes it a really nice solution though, is that he approaches it by starting with a "clusterBy" higher order function that takes a signature function and a list and returns a list of groups clustered by the signature (my explanation is bad; his examples make it much more clear). Essentially, instead of directly solving the problem, he creates a general purpose tool which can then trivially be applied to the specific problem at hand. clusterBy is the kind of function that I could see being useful to other problems. So not only does he end up with the solution to this problem, he also ends up with a useful tool to make future programming tasks simpler. Haskell's smooth support of higher order functions makes this a natural approach and it seems to be the way that proficient Haskell programmers end up taking on most problems.

Python had the good taste to steal a few items from Haskell's bag of tricks though, so I was inspired to try a Python version. Here's a rough equivalent of clusterBy:

def cluster_by(f,lst):
    transformed = [f(x) for x in lst]
    d = dict()
    for t,i in zip(transformed,lst):
        d.setdefault(t,[]).append(i)
    return d.values()

Not as pretty as the Haskell, but works essentially the same way.

Then, to solve the original problem, we need a normalize function:

def normalize(t):
    letters = list(t[0] + t[1])
    letters.sort()
    return "".join(letters)

It takes the tuple of two state names, sorts all the letters in them and returns that as a string. The final piece is just to apply cluster_by with the normalize function to the cartesian product of states x states and find the resulting entries with multiple entries (and since I was going Haskell-style, I decided to use list comprehensions instead of for loops as well):

clustered = cluster_by(normalize,[(x,y) for x in states for y in states])
print [x for x in clustered if len(x) > 2]

It runs in 25 milliseconds on my machine, so slightly more overhead than the procedural approach, but close enough to not matter and, IMO, it's cleaner and would lend itself more to further reuse of the cluster_by or normalize functions.

So first knowing an idiom from a scripting language can produce better C++ code (see Vince Huston's solution), then taking a lesson from Haskell I'm able to come up with better Python code.

I'm waiting for a Prolog programmer to come along and smoke everyone next.

2007-10-30 14:43:50

comments: 27

NaGraNoWriMo

At some point a while back, my little meditational abstract doodles started occasionally getting boxes around them. When there were several of those on a page, I noticed that they sort of resembled the panel layout of a comic. I kind of liked that so I experimented more in that direction, playing with deliberate panel layouts and borrowing from the vocabulary of comics and sequential art while keeping my subject matter abstract. Now I've got a fairly large collection of these abstract comics and it's a style that I think is still interesting and rich for experimentation.

Lately I've found myself in several conversations with friends about National Novel Writing Month (NaNoWriMo). The executive summary if you're too lazy to click the link and aren't familiar with it: you give yourself the month of November (30 days) to write a 50,000 word novel. It doesn't have to be good or make much sense or anything, it just has to be started and finished within the month. I've always thought it was a cool idea. A bunch of my friends have attempted and at least one has even successfully completed a NaNoWriMo.

As a writer, I'm not really interested in writing fiction so it isn't something I'd be up for myself. The idea continues to intrigue me though so I've decided that this year I want to try to do an abstract graphic novel NaNoWriMo style. A "NaGraNoWriMo" I guess (calling it a "NaAbsGraNoWriMo" would just be taking it too far.)

Obviously, I can't use word count as a metric for my graphic novel. Laying out, drawing and inking a full page of panels I've found takes me about three to four hours with the level of detail that my style involves. The notion of "a picture is worth a thousand words" I think actually works out pretty well for my purposes. 50,000 words = 50 pages = 1.67 pages per day = about 6 hours per day of me drawing for 30 days. I think that's a similar magnitude of challenge to what other NaNoWriMo writers are attempting.

That's the goal, anyway. I've got a lot of other things going on in November so it's not going to be easy to find all that time. I'll try to post the pages to flickr as I go, but scanning and posting takes time that I could be drawing so that will be a lower priority than finishing.

If you don't see me outside my apartment or hear from me next month, now you'll know why.

[Yes, I know that someone else has used the term "NaGraNoWriMo" for something similar. Their approach was to write a "script" for a 175 page graphic novel. I honestly don't know quite what a script for a graphic novel is but I'm pretty sure that it wouldn't work for an abstract one. I like my idea better.]

2007-10-25 00:55:38

comments: 2

periodic personal update

I'm in the office now on a Saturday enjoying the free air conditioning and functioning ceiling, which, at the moment, my apartment is lacking. I don't have anything else pressing to do at the moment so I figured I would take this opportunity to post an all too rare update on the miscellaneous stuff going on in my life for those of you that I'm not in frequent contact with.

For the most part, it's been pretty quiet here. I work a lot, paint and draw obsessively (I think there should be enough material for a Volume 2 soon, in the meantime, it usually ends up on flickr), and occasionally go out for beers, movies, or concerts with my friends. I've even been running semi-regularly.

I currently have two plants which, by some miracle, I haven't managed to kill yet with my neglect.

In July, a couple coworkers and I went skydiving. I can not possibly recommend it highly enough if you've never done it. I liked it for a very different reason than I expected. When you jump out of the plane, there's about a minute of free fall which is pretty exhilarating. You don't know which way is up for a few seconds, then you're assaulted by 120mph wind and deafening noise. Your brain sort of shuts off because it can't handle the intensity of the sensations which it has absolutely nothing in memory to compare to. I actually kind of forgot how to breathe during free fall and had to consciously remind myself. That was fun in its way, but I'm not an adrenaline junky type so it wasn't the big draw for me. When your chute opens, you're suddenly thrown into a different universe. In a second or two, you go from free fall to absolute stillness and dead silence. You're far enough up that you have no sensation of movement at all and you can see for hundreds of miles to the horizon in all directions. You realize that there's basically no other solid matter anywhere within thousands of feet of you in any direction. It's an amazing sense of serenity and calm more powerful than anything I've ever experienced with yoga or meditation.

One thing I don't recommend is getting an ear infection. I was pretty much knocked out of commission last week with strep throat that turned into an ear infection. The strep was quite pleasant in comparison. The ear infection was several days of continuous, unrelenting, maddening pain. Pain killers seemed to have no effect. If you've never had one, it's basically like jabbing a pencil into your ear, sharp end first, then giving it a little tap with a ball peen hammer about once a second. For days. Antibiotics eventually were brought in and made short work of the infection, but it's not an experience I would ever like to repeat.

On the technical front, I've been continuing my climb up the Erlang learning curve. The recently published book by Joe Armstrong has been a big help. I'm particularly smitten with Mnesia at the moment and I also see the OTP framework being very useful to me in the future. I've prototyped some simple microapps with Erlang and Mnesia and am scheming on how I can sneak some of it into work.

Speaking of work, we've been up to some cool stuff. I spent a couple weeks converting our whole (modest) server infrastructure to virtual servers running on Xen. So far it's gone really smoothly and it should make things even more reliable and flexible in the future. I'm really impressed with Xen. We evaluated VMWare as well and it has some advantages, but for our situation, we found Xen to be more than adequate performance-wise and much simpler to deal with (I'm vastly more comfortable with arcane Unix commands than point and click stuff. YMMV).

We also recently got a nice grant from NIMH for a very worthwhile HIV prevention project. Sadly, I don't think most of the grant money is going directly into my salary. We've got more cool projects in development too.

I've also been psyched that we recently hired a new programmer to help out with our general world domination plans. She's been kicking ass and taking names, rapidly getting up to speed with the Python and TurboGears stuff we use, and she's an industrial fan to boot. I expect the orbital laser to be operational very soon.

With that addition, we're up to four desktop Linux users in the office now (all Ubuntu) with some others experimenting, which is a nice change from two years ago when it was just me.

Finally, it appears that the Japanese convenience store in my neighborhood is again carrying Choco Monaka Jumbo ice cream bars after months of being out of stock. I was beginning to panic.

2007-08-25 17:01:07

comments: 6

forced hiatus

For once, I have a good excuse for not posting anything here lately. Usually I don't post for months at a time and it's just out of laziness and a lack of anything interesting to write about. I'm not saying that wasn't also the case this time, but this time I can at least pretend it was something else preventing me from posting.

Many of you noticed that thraxil.org was offline for the last month or so.

The root cause was that registerfly.com, the registrar that I'd registered my domains through, lost their ICANN accreditation for generally being sleazy. They transferred the domains in their control to a couple other registrars. Last time I renewed thraxil.org, I paid for a couple years at once to minimize the chances that it would expire on me when I wasn't paying attention. Somehow, that got lost when they transfered it and the domain expired in June, a year before I was expecting it to.

Normally, a domain expiring isn't that big a deal. ICANN gives you a 40 day grace period where the original owner is allowed to renew their domain without penalty and domain squatters can't get it. This generally works pretty well to eliminate the really predatory domain squatting. However, you can only renew the domain through the original registrar. This obviously was a problem for me. I couldn't renew it through registerfly since they were no longer accredited. Registerfly couldn't give me the transfer authorization codes that I needed to transfer the domain to a different registrar because the domain was expired. As far as I can tell, there just isn't a clear procedure in place for how to deal with the situation of an expired domain on a defunct registrar.

I won't even bother going further into what I had to do to get everything back up. Obviously it involved a lot of bureaucratic nonsense that made me want to embrace violence as a problem solving technique. There was also a round of me having to prove my identity via several easily forged tokens (screenshots of my account page? emailing a scan of my photo id? ) that didn't exactly make me feel secure with the general process.

Anyway, we're back online now so I have no more excuses.

2007-07-23 16:03:28

comments: 6

book

I've been drawing and painting a lot in the last couple years. Obsessively, perhaps.

A few months ago, I was looking at the shear volume of work I'd produced and realized that it was actually enough to fill a book. That idea kind of stuck with me and I became fascinated with the concept of having my own book. As an unknown, there was no way any publishing companies out there would bother with me, and I didn't really relish the thought of having to deal with a publisher anyway. So I did some research and found that technology has recently made a print on demand model actually pretty feasible.

Lulu.com looked like the best of what's out there so I ordered a couple of random books from their site to check the quality and I was pretty impressed. The prices are a little high, but considering that not too long ago it really wouldn't have been financially possible to publish a book without doing a print run of hundreds of copies, it's not too bad.

I don't really know jack about anything print related so I roped in some friends to help me photograph the paintings that were too big to scan, lay everything out and put it into a pdf with the right settings, and design a professional looking cover. They came through for me and put together a complete package that makes it look like I'm actually a real artist or something. Neat.

Anyway, it's now available. You can order a copy directly through lulu.com's online store.

The 158 page color printed 9x7 inch paperback version is $28.23. That's actually just the base manufacturing cost that lulu charges. I make no profit whatsoever on the book. This is purely a labor of love for me, not a money making venture. There's also a free PDF download available if you either want to preview it or if you have access to your own book printing equipment and want to print it yourself.

The license on the book according to the site is the Creative Commons Attribution License 2.5 but that's only because Lulu.com's publishing interface didn't include the option for the Public Domain which is what I prefer. In other words, do whatever the hell you want with the book. I don't care. I'm just happy that it exists. If you like it, you can buy me a beer some day.

Again, thanks to Catherine, Marc, and Jessica for helping me out on this, one of my more involved crazy schemes.

2007-01-11 21:10:28

comments: 5

triple plugged

For the third year in a row, MFR has been nominated for a PLUG Award. This year, they created a new "Best Music Blog" category separate from the "Best Music Website" category. That means that our little operation consisting of a couple college friends with a Movable Type install and a bit of free time isn't competing directly against sites like Pitchfork or MySpace who have, like, budgets and employees and such.

We're always happy just to get nominated. This year maybe we might even win. At the very least, it'll be a pleasant change to lose to someone cool and deserving like brooklynvegan, MOKB, or stereogum instead of the mega-corporate MySpace like we did last year.

2006-11-15 15:16:12

comments: 1

TurboGears Deployment with supervisord and workingenv.py

As my web development involves more of the microapps stuff, and the applications I'm building become clusters of larger and larger numbers of small services, the deployment challenges become more important to solve. This post is an explanation of my current best practices at the time of writing.

Most of what I build these days is built on the TurboGears Python web framework and runs on Gentoo Linux. I decompose applications into as many small services as possible. As a result, on our production server at work currently has dozens of TG applications running on it. We serve Perl, Java, and PHP apps too and they each have their own deployment challenges, but for now I'm going to focus on the Python stuff.

I started with TurboGears when it was at version 0.8 or so and have basically kept up with the current version (it's at 1.0b as I'm writing this). The TG APIs haven't quite stood still across those versions, so the apps written for 0.8 won't run directly with 0.9 or 1.0. They still work fine though and I'd rather not spend my entire life continuously porting all my applications to the current version of the TG API. So one of the most important aspects of my deployment strategy is a way to have multiple, mutually incompatible versions of TG installed side by side on the same machine without interfering with each other.

It took me a while, but after being bitten too many times by upgrades to one library breaking other applications on the same machine, I eventually came to agree with Ian Bicking's argument against site-packages (and I've been bitten by this in almost every language/environment I've ever worked in; it's not just a Python issue).

Luckily, Ian actually did something about the problem and wrote workingenv.py which is a handy script that sets up an isolated Python environment. Workingenv.py combined with setuptools has been a maintenance dream come true. I now setup a working-env for every single application I write and easy-install TG and any other libraries that are needed for that application into it. I now have complete control over what versions of what libraries are in use by each application (with workingenv.py's "--requirements" flag) and they can all peacefully coexist on the same machine. The cost is a bit of disk space and an extra shell command here and there to set up and activate the environments, but it saves so many upgrade related headaches that it's an easy win.

I used to deploy all my TG apps with Apache and mod_python using the mpcp bridge. Actually, many are still deployed that way, but I'm moving away from it. Embedding python in an apache process works really well for some things, but it also made isolating environments with workingenv.py significantly more complicated (I do have a way of doing it, but it's a messy hack that I'd rather not include in a post on "best practices"). Mod_python deployment also has memory usage issues the more apps you pile into it, and having them all tied to the same Apache process means that to restart or reconfigure one, the apache daemon has to be restarted so all the other apps tied to it get restarted. As some of them become more critical to our operations, the couple seconds of downtime each time apache gets restarted becomes more of a problem.

So I've now switched to the approach that the TG community seems to have basically agreed is best, which is running TG apps standalone and proxying to them with apache or lighttpd (we use apache at work and I use lighttpd for this site). The main drawback to this approach is that it means there are a lot more long-running server processes that have to be kept running. So that's more start/stop scripts, monitoring, and logs for the sysadmin side of me to have to deal with. It also means more chance that if one of the apps manages to crash, someone has to notice and restart it.

Supervisord is the best solution to this problem that I've found. Supervisord is a daemon that starts your services as child processes and watches them, restarting them automatically if they die. It handles things like dependencies, monitoring, and is smart enough to know how to back off on restarts if the process goes crazy. Titus Brown's introductory article is a good place to start to learn more about it.

Here's a more concrete example of how all of these pieces fit together.

The first step in setting up a new application is to create a working-env for it. I like to include a requirements.txt that lists the exact versions of every library that I want to use for the application. The example ones that are linked to in Ian Bicking's post are a good starting point.

$ python ~/bin/workingenv.py working-env --requirements requirements.txt

Then, to do development on the application, you activate the environment and start the server:

$ source working-env/bin/activate
$ ./start-foo.py

For production, we'll need a single script that the supervisord can execute that starts the server in production mode. It looks something like:

#!/bin/bash
#  this file is: start.sh
cd $1
source working-env/bin/activate
./start-foo.py $2.cfg &
echo $! > /tmp/foo.pid
wait $!

and the corresponding config section in supervisord.conf is

<program foo>
 command pidproxy /tmp/foo.pid /path/to/foo/start.sh /path/to/foo/ prod
 auto-start true
 auto-restart true
 logfile /var/log/foo-supervisor.log
 user pusher
</program>

Those require a little bit of explanation. start.sh takes a path to the project's directory and the name of a config file as its arguments ($1 and $2) to be flexible for different deployment situations. Since start.sh spawns the python process seperately, if we just did:

#!/bin/bash
cd $1
source working-env/bin/activate
./start-foo.py

Supervisord wouldn't quite work right. It would be sending signals to the bash process instead of the python process and stopping and restarting the service wouldn't work right. So instead, start.sh starts the python process in the background, writes its pid out to a file and then waits for the python process. The pidproxy program (which comes with supervisord for just such a purpose) then is used to run start.sh and send signals directly to the python process.

It took some work to figure all this out, but once you've done it for one or two apps, it becomes pretty quick and easy to set up any application for deployment like this. Probably my next step will be to create a paster template for tg-admin so these scripts get automatically put into quickstarted projects instead of having to copy them in manually.

I should also mention that there are a few more components of our deployment strategy that I didn't really talk about. The first is version control. If you're doing development and not using version control of some sort, you are insane and deserve whatever horrible fate befalls you. Second, we have a single application that handles deployment to production. You press one button and it checks out your app to a staging environment, runs the unit tests, then rsyncs it to the production server and runs any additional steps that are needed there (in the above case, it would do a 'supervisorctl restart foo' on the production server to get the new code running). It also logs every step, tags releases in svn, and allows for easily rolling back production to a previous release in case something goes wrong. Our pusher/deployment system is very particular to our environment, so I won't explain it in too much detail here, but I highly recommend spending the time to set up something similar for your situation. At the very least, you'll want a shell script that does the deployment process in a single step.

2006-09-13 12:18:27

comments: 2

280

280

colored pencil and pen on paper. larger sizes on flickr.

2006-09-24 13:06:34

medium
colored pencil and pen on paper

comments: 0

272

272

colored pencil on paper. fullsize on flickr.

2006-09-05 21:31:57

medium
colored pencil on paper

comments: 0

255

255

brush pen drawing of bone under electron microscope

2006-07-29 19:54:04

medium
brushpen
size
5.5x6 inches

comments: 1

244

244

done on airplanes (mostly during our 5 hours sitting on the tarmac in Paris waiting for them to replace a fuel pump).

largeer version available on flickr as usual.

2006-07-19 09:12:56

medium
pencil on paper

comments: 0

Summit of Hutumah

Summit of Hutumah

Another abstract roughly inspired by histology photos.

Done Alla Prima in about 6 hours. Pigments used: Titanium White, Sap Green, Light Green.

Oil on canvas paper. 14x15 inches.

2006-07-16 16:22:55

medium
oil on canvas paper
size
14x15 inches

comments: 1

Phantom Capillaries

Phantom Capillaries

Lani got me this great book of electron microscope photos of various animal tissue samples. This painting was inspired by one of the capillaries that surround alveoli in rat lungs.

see the version on flickr for more detail.

2006-07-10 22:09:07

size
9x12 inches
medium
oil on panel

comments: 0

the-fu.com

"game plans for dreamers". New monthly magazine site that I built the backend for for a friend.

2008-05-01 14:24:32

comments: 0

2008 PLUG: Independent Music Awards

MFR nominated for the fourth year in a row. Vote early, vote often, kids.

2007-11-15 22:14:15

comments: 0

Featured Artist: Anders Pearson

I was the featured artist on moleskinerie.com the other day.

2007-11-15 21:46:25

comments: 0

PARK(ing) Day

My coworkers and I had lunch today in the "park" on 113th and Broadway.

2007-09-21 13:24:08

comments: 0

My Kid Could Paint That

Yes! That's pretty much my opinion of much of modern art too.

2007-09-07 01:01:32

comments: 2

A Voice for Taiwan's Freedom, History

Saw these guys play last week. They put on a good show. Erhu + Black Metal is a surprisingly good combination.

2007-08-27 16:48:46

comments: 2

Hipster Olympics

Too funny. I've watched this closely several times to see if any of my friends are in it but I haven't spotted any of them.

2007-08-25 17:08:25

comments: 2

Bates Magazine: Music For Robots

My alumni magazine did a writeup on MFR

2007-07-23 16:05:45

comments: 0

Tapeheads

"King of the music blogs is Music for Robots, an MP3 blog curated by a bunch of college friends. If Pitchfork came to prominence as the site that helped break Arcade Fire, then Music for Robots can claim the credit for the irresistible rise of Tapes 'n Tapes, a four-piece art-indie outfit from Minneapolis who were the breakout act at this year's SXSW."

2006-08-30 11:57:26

comments: 0

No Education Patents!

"This site presents a reaction of the education community to software patents that threaten our industry."

2006-08-23 18:21:17

comments: 0

reply to: books

Kudos to you for doing all of that work and making it available as free PDF downloads! Wouldn’t it be really nice if everyone was as generous as you are?

reply to: Summit of Hutumah

I was just coming up for air from my basement studio. I am working on a large painting based on the vocal folds and was googling for some good “voice” histology. I quickly came across “summit of Hutumah” and was led to your other works. It is perfect for me at this very moment- I have tagged your site and will look often for inspiration! perfect! Thanks!
Penny Oliver

reply to: periodic personal update

Great Pics :-)

Well done.

reply to: Subprocess Hanging: PIPE is your enemy

reply to: Subprocess Hanging: PIPE is your enemy

Using your solution is a really great way to solve this problem, I just hope the next person who needs and finds this info leaves a thank you on your discovery should they be heading in the same direction.

reply to: Subprocess Hanging: PIPE is your enemy

I like the fact that you added the page because you could not find it in Google, that was nice of you to think about the next guy!

reply to: A Simple Programming Puzzle Seen Through Three Different Lenses

yuck, the indentation got clobbered, and the underscores turned into font changes. in the groupby, use \(a,x)(b,y) instead of what you see there. in the original I had underscores for x and y but they melted. x and y work fine, they just get ignored. also you may have to fix the indentation to get the thing to parse.

reply to: A Simple Programming Puzzle Seen Through Three Different Lenses

—Here’s my Haskell version that I wrote on seeing the problem. I didn’t use any dicts or hashes, just sorting.

import Data.List
import Data.Ord

states = [“alabama”,“alaska”,“arizona”,“arkansas”,“california”,“colorado”,
“connecticut”,“delaware”,“florida”,“georgia”,“hawaii”,“idaho”,
“illinois”,“indiana”,“iowa”,“kansas”,“kentucky”,“louisiana”,
“maine”,“maryland”,“massachusetts”,“michigan”,“minnesota”,
“mississippi”,“missouri”,“montana”,“nebraska”,“nevada”,
“newhampshire”,“newjersey”,“newmexico”,“newyork”,“northcarolina”,
“northdakota”,“ohio”,“oklahoma”,“oregon”,“pennsylvania”,“rhodeisland”,
“southcarolina”,“southdakota”,“tennessee”,“texas”,“utah”,“vermont”,
“virginia”,“washington”,“westvirginia”,“wisconsin”,“wyoming”]

lettersets = sortBy (comparing fst)
[(sort(x++y),(x,y))|x<-states, y<-states, x < y]

main = print [r | r <- groupBy (\(a,)(b,)->a==b) lettersets, length r > 1]

reply to: Error And Annihilation

Always fun to be creative, looking forward to seeing what you hammer out after all that time!!

reply to: books

I have been trying to read at least 1 book a month and have done it since 2001 and love most of the books I have read over the years and I am looking forward to reading these two in the post over the next couple of month.

bio

i’m a programmer, artist, writer, and more.

i wrote the code that powers this site.

i work for CCNMTL as a programmer.

i write for music for robots, which is a very cool music blog.

i’m a WaSP member.

i paint, and draw quite a lot. the good stuff i post here. everything else ends up on flickr.

i have a proper homepage which has more info about me if this doesn’t do it for you.