Docker and Upstart

Docker has some basic process management functionality built in. You can set restart policies and the Docker daemon will do its best to keep containers running and restart them if the host is rebooted.

This is handy and can work well if you live in an all Docker world. Many of us need to work with Docker based services alongside regular non-Docker services on the same host, at least for the near future. Our non-Docker services are probably managed with Systemd, Upstart, or something similar and we’d like to be able to use those process managers with our Docker services so dependencies can be properly resolved, etc.

I haven’t used Systemd enough to have an opinion on it (according to the internet, it’s either the greatest thing since sliced bread or the arrival of the antichrist, depending on who you ask). Most of the machines I work on are still running Ubuntu 14.04 and Upstart is the path of least resistence there and the tool that I know the best.

Getting Docker and Upstart to play nicely together is not quite as simple as it appears at first.

Docker’s documentation contains a sample upstart config:

description "Redis container"
author "Me"
start on filesystem and started docker
stop on runlevel [!2345]
respawn
script
    /usr/bin/docker start -a redis_server
end script

That works, but it assumes that the container named redis_server already exists. Ie, that someone has manually, or via some mechanism outside upstart run the docker run --name=redis_server ... command (or a docker create), specifying all the parameters. If you need to change one of those parameters, you would need to stop the upstart job, do a docker stop redis_server, delete the container with docker rm redis_server, run a docker create --name=redis_server ... to make the new container with the updated parameters, then start the upstart job.

That’s a lot of steps and would be no fun to automate via your configuration management or as part of a deployment. What I expect to be able to do with upstart is deploy the necessary dependencies and configuration to the host, drop an upstart config file in /etc/init/myservice.conf and do start myservice, stop myservice, etc. I expect to be able to drop in a new config file and just restart the service to have it go into effect. Letting Docker get involved seems to introduce a bunch of additional steps to that process that just get in the way.

Really, to get Docker and Upstart to work together properly, it’s easier to just let upstart do everything and configure Docker to not try to do any process management.

First, make sure you aren’t starting the Docker daemon with --restart=always. The default is --restart=no and is what we want.

Next, instead of building the container and then using docker start and docker stop even via upstart, we instead want to just use docker run so parameters can be specified right there in the upstart config (I’m going to leave out the description/author stuff from here on out):

start on filesystem and started docker
stop on runlevel [!2345]
respawn
script
    /usr/bin/docker run \
    -v /docker/host/dir:/data \
    redis
end script

This will work nicely. You can stop and start via upstart like a regular system service.

Of course, we would probably like other services to be able to link to it and for that it will need to be named:

start on filesystem and started docker
stop on runlevel [!2345]
respawn
script
    /usr/bin/docker run \
    --name=redis_server \
    -v /docker/host/dir:/data \
    redis
end script

That will work.

Once.

Then we run into the issue that anyone who’s used Docker and tried to run named containers has undoubtedly come across. If we stop that and try to start it again, it will fail and the logs will be full of complaints about:

Error response from daemon: Conflict. The name "redis_server" is
already in use by container 9ccc57bfbc3c. You have to delete (or
rename) that container to be able to reuse that name.

Then you have to do the whole dance of removing the container and restarting stuff. So we put a --rm in there:

start on filesystem and started docker
stop on runlevel [!2345]
respawn
script
    /usr/bin/docker run \
    --rm \
    --name=redis_server \
    -v /docker/host/dir:/data \
    redis
end script

This is much better and will mostly work.

Sometimes, though, the container will get killed without a proper SIGTERM signal getting through and Docker won’t clean up the container. Eg, it gets OOM-killed, or the server is abruptly power-cycled (it seems like sometimes even a normal stop just doesn’t work right). The old container is left there and the next time it tries to start, you run into the same old conflict and have to manually clean it up.

There are numerous Stack Overflow questions and similar out there with suggestions for pre-stop stanzas, etc. to deal with this problem. However, in my experimentation, they all failed to reliably work across some of those trickier situations like OOM-killer rampages and power cycling.

My current solution is simple and has worked well for me. I just add couple more lines added to the script section like so:

script
    /usr/bin/docker stop redis_server || true
    /usr/bin/docker rm redis_server || true
    /usr/bin/docker run \
    --rm \
    --name=redis_server \
    -v /docker/host/dir:/data \
    redis
end script

In the spirit of the World’s Funniest Joke, before trying to revive the container, we first make sure it’s dead. The || true on each of those lines just ensures that it will keep going even if it didn’t actually have to stop or remove anything. (I suppose that the --rm is now superfluous, but it doesn’t hurt).

So this is how I run things now. I tend to have two levels of Docker containers: these lower level named services that get linked to other containers, and higher level “application” containers (eg, Django apps), that don’t need to be named, but probably link in one of these named containers. This setup is easy to manage (my CM setup can push out new configs and restart services with wild abandon) and robust.