Every once in a while, I run across a bug or a tricky problem where googling for a solution doesn't turn up much. When I come up with a solution, I like to write it up and put it online so the next person to come across it hopefully will have an easier time figuring it out. This is one of those posts.
One of the internal applications I wrote at work does a lot of work via external programs. It's basically glueing together a whole bunch of shell scripts and putting a nice interface on them.
Running an external program from Python isn't very hard in the simple case. There's actually a wealth of options available. The entry level is to use os.system() and give it a list of arguments. That gives you the return code but doesn't give you the output of the command.
For what I'm doing, I need to have access to the return code, STDOUT,
and STDERR. Requirements like that lead to the os.popen*
functions. Basically, something like:
import os
(c_stdin,c_stdout,c_stderr) = os.popen3(cmd,'r')
out = c_stdout.read()
err = c_stderr.read()
c_stdin.close()
c_stdout.close()
c_stderr.close()
There are still problems with that. The environment that the child
command runs in (variables, cwd, tty, etc) is the same environment
that the parent is running in. So to set, eg, to set environment
variables for the child, you have to put them into os.environ in the
parent, or to set the cwd for the child command, you have to have the
parent do an os.chdir(). That can be troublesome in some
situations. Eg, if the parent is a CherryPy process, doing an
os.chdir() makes it hopelessly lost and it will crash. So you have to
fork() a child process, set up the environment there, do the above
code, and then pass the results back to the parent process.
This has been enough of a pain point for Python programmers that Python 2.4 added the subprocess module. The code above can be replaced with:
from subprocess import Popen, PIPE
p = Popen(cmd,stdout=PIPE,stderr=PIPE)
(out,err) = p.communicate()
Aside from being a little shorter, subprocess.Popen() also takes
additional arguments like cwd and env that let you manipulate the
environment of the child process (it does the fork() for you). It
basically gives you one very nice interface for doing anything and
everything related to spawning external commands. Life is generally
better with subprocess around.
Unfortunately, there is a subtle, devious bug in that code. I know this after encountering it and spending many hours trying to figure out what was going on.
Where I encountered it was when the command being run was doing an svn
checkout. The checkout would run for a while and then the svn command
would hang at some point. It wouldn't use CPU, there would be no error
messages. The process would still show up in ps or top. It would just
stop and the parent process would sit and wait for it to
finish. Complete deadlock. Running the exact svn command on the
commandline, it would run with no problems. Doing an svn checkout of a
different repository would work fine. Kill the hung svn process and
the parent would complete and STDOUT would show most of the expected
output from the svn checkout. With the particular repository, it would
always hang at exactly the same spot; completely repeatable.
How could an svn checkout of a particular repository hang, but only
when run via subprocess?
After much frustrating debugging, searching, and experimentation, I
narrowed it down to the output of the svn command on STDOUT. If I
added a -q (quiet) flag, it would complete without hanging. I
eventually noticed that the output that had been collected in STDOUT
after killing the svn process was right around 65,000
characters. Since 216 is 65536, that seemed like a coincidence worth
investigating. I wrote a test script that just wrote 216 characters
to STDOUT and ran it via subprocess. It hung. I modified it to print
216 - 1 characters to STDOUT. No hanging. The troublesome svn
repository happened to have a lot of files in it, so a lot of verbose
output on the checkout.
A closer inspection of the subprocess.Popen docs revealed a warning "Note:
The data read is buffered in memory, so do not use this method if the
data size is large or unlimited." I'd probably read that before and
assumed that it was a warning about possibly consuming a lot of memory
and being inefficient if you try to pass a lot of data around. So I
ignored it. The STDOUT chatter of shell scripts that I was collecting
for logging purposes did not strike me as "large" (65K is positively
tiny these days) and it isn't an application where I'm particularly
concerned about memory consumption or efficiency.
Apparently, that warning actually means "Note: if there's any chance that the data read will be more than a couple pages, this will deadlock your code." What was happening was that the memory buffer was a fixed size. When it filled up, it simply stopped letting the child process write to it. The child would then sit and patiently wait to be able to write the rest of its output.
Luckily the solution is fairly simple. Instead of setting stdout and
stderr to PIPE, they need to be given proper file (or unix pipe)
objects that will accept a reasonable amount of data. (A further hint
for anyone who found this page because they encountered the same
problem and are looking for a fix: Popen() needs real file objects
with fileno() support so StringIO-type fake file objects won't work;
[tempfile.TemporaryFile] is your friend).
This strikes me as kind of a poor design. Subprocess is wonderful in
most ways and a real improvement over the old alternatives. But with
this kind of issue, the programmer using it will probably not
encounter any problems in development and think everything is fine but
some day will find their production app mysteriously deadlocked and
have almost no clues as to what's causing it. That seems like
something that deserves a big flashing red warning in the docs
every time PIPE is mentioned.
reply to: Summit of Hutumah
I was just coming up for air from my basement studio. I am working on a large painting based on the vocal folds and was googling for some good “voice” histology. I quickly came across “summit of Hutumah” and was led to your other works. It is perfect for me at this very moment- I have tagged your site and will look often for inspiration! perfect! Thanks!
Penny Oliver
Anonymous
reply to: periodic personal update
Great Pics :-)
Well done.
Zerahia roei
reply to: Subprocess Hanging: PIPE is your enemy
consider pexpect
http://news.ycombinator.com/item?id=146816
bayareaguy
reply to: Subprocess Hanging: PIPE is your enemy
Using your solution is a really great way to solve this problem, I just hope the next person who needs and finds this info leaves a thank you on your discovery should they be heading in the same direction.
Leon
reply to: Subprocess Hanging: PIPE is your enemy
I like the fact that you added the page because you could not find it in Google, that was nice of you to think about the next guy!
Greg
reply to: A Simple Programming Puzzle Seen Through Three Different Lenses
yuck, the indentation got clobbered, and the underscores turned into font changes. in the groupby, use \(a,x)(b,y) instead of what you see there. in the original I had underscores for x and y but they melted. x and y work fine, they just get ignored. also you may have to fix the indentation to get the thing to parse.
Anonymous
reply to: A Simple Programming Puzzle Seen Through Three Different Lenses
—Here’s my Haskell version that I wrote on seeing the problem. I didn’t use any dicts or hashes, just sorting.
import Data.List
import Data.Ord
states = [“alabama”,“alaska”,“arizona”,“arkansas”,“california”,“colorado”,
“connecticut”,“delaware”,“florida”,“georgia”,“hawaii”,“idaho”,
“illinois”,“indiana”,“iowa”,“kansas”,“kentucky”,“louisiana”,
“maine”,“maryland”,“massachusetts”,“michigan”,“minnesota”,
“mississippi”,“missouri”,“montana”,“nebraska”,“nevada”,
“newhampshire”,“newjersey”,“newmexico”,“newyork”,“northcarolina”,
“northdakota”,“ohio”,“oklahoma”,“oregon”,“pennsylvania”,“rhodeisland”,
“southcarolina”,“southdakota”,“tennessee”,“texas”,“utah”,“vermont”,
“virginia”,“washington”,“westvirginia”,“wisconsin”,“wyoming”]
lettersets = sortBy (comparing fst)
[(sort(x++y),(x,y))|x<-states, y<-states, x < y]
main = print [r | r <- groupBy (\(a,)(b,)->a==b) lettersets, length r > 1]
Anonymous
reply to: ¡gracias muchas a todos!
I always like to say it was amazing, but thats just me and we are probably not talking about the same thing, lol I will let you know what I was referring to if you let me know what you were, deal?
Nicole
reply to: Error And Annihilation
Always fun to be creative, looking forward to seeing what you hammer out after all that time!!
Tom Forester
reply to: books
I have been trying to read at least 1 book a month and have done it since 2001 and love most of the books I have read over the years and I am looking forward to reading these two in the post over the next couple of month.
Debra