migrating applications between servers with (virtually) zero downtime

By anders pearson 27 Mar 2006

The problem: You need to move a web application to a different server. Eg, an application accessible as app.example.com is currently running on a machine with the IP address 1.2.3.4 and it needs to be moved to 1.2.3.5. This will involve a DNS change. DNS updates, unfortunately aren’t very predictable. Unless you administrate your own DNS servers, you probably don’t have much control over exactly when a change goes out. Even if you can control DNS, it can take up to a day for the new DNS entry to propagate out to users (because it may be cached at several points in between).

If you admin your own DNS, you can lower the TTL (time-to-live) and do the update exactly when you want and that will probably be good enough. Most of us don’t have that level of control over our DNS though. At work, we don’t have direct control over DNS; the network admins take our requests and push everything out in a nightly batch, with a relatively high TTL. For thraxil.org and other sites that I run on my own, I rely on free DNS services which I also have no real control over. Pretty much anyone who isn’t an ISP is probably in a similar situation.

At work we’ve come up with a way to do the move with basically zero downtime and complete control over the exact time that the transition happens. It doesn’t require any admin access above and beyond control over the two web servers involved. IMO, it’s just about as simple as messing with DNS TTL stuff.

We’re probably not the first to have come up with this technique. It’s fairly simple and obvious if you’re familiar with apache configuration. I haven’t seen it mentioned anywhere before though and we had to figure it out ourselves, so clearly it’s not as well-known as it ought to be.

The technique basically revolves around apache’s mod_rewrite module and its ability to proxy requests. Make sure it’s installed and enabled on your servers. With some setups, you’ll also need to ensure that mod_proxy is installed for mod_rewrite to be able to do proxying. Make sure you understand how your apache server has been compiled and configured. Technically, you only need it set up on one of them, but it’s useful in general so you’ll probably want it on both.

The procedure starts with getting the application running on both servers. On the new one, it will have to be running on some other hostname temporarily so you can test and make sure it’s all working. Then, at a point in time that’s convenient for you (late night, early morning, etc.) you replace the configuration on the old server with a proxying directive that proxies all requests coming in to that server to the new one. Now, you can do the DNS change (or request the change from the DNS admins). Once the proxy is running, any request for to either the old IP address or the new one will both end up going to the new server. Once the DNS change has gone through and has had time to propagate out to everyone (checking the logs on the old server to make sure there are no more requests to it is a good way to be fairly sure), you can safely turn off the proxy and decommision the old server.

Here’s a more detailed example. Say we have an application running with a hostname of app.example.com, which is an alias for the IP address 1.2.3.4. We went to move it onto 1.2.3.5 with as little downtime as possible.

First, we register app-new.example.com to point to 1.2.3.5 (or just set it up in /etc/hosts since it doesn’t really have to be public). We get the application installed and running, probably against a test database, on 1.2.3.5 and make sure that everything works like it should.

On 1.2.3.4, apache’s configuration probably has a virtual host section like:

:::apacheconf
<VirtualHost *>
   ServerName app.example.com
   # ... etc. 
</VirtualHost>

We duplicate the same virtual host section to the apache config on the new server, 1.2.3.5. It’s a good idea to then test it by overriding app.example.com in your /etc/hosts (or your OS’s equivalent) to point to the new server.

Then, on 1.2.3.4 we change it to:

:::apacheconf
<VirtualHost *>
  RewriteEngine On
  RewriteRule (.*) http://app-new.example.com/$1 [L,P]
</VirtualHost>

If there’s data to migrate, we then take apache down on 1.2.3.4, migrate the data to the new server, and bring apache back up. If data needs to be migrated, there isn’t really a way to avoid some downtime or you’ll run the risk of losing some during the move. Having a script or two pre-written to handle the data migration is usually a good idea to ensure that it will go quickly and smoothly. There are other tricks to minimize this downtime, but for most of us a few seconds or a few minutes of downtime isn’t the end of the world (and is highly preferable to lost or inconsistent data).

At this point, requests to app.example.com will be proxied over to 1.2.3.5 without users noticing anything except perhaps a little extra latency.

Then the request is put in to the DNS admins to change the app.example.com alias to resolve to 1.2.3.5 instead of 1.2.3.4.

Eventually, that will go through and users will start hitting 1.2.3.5 directly when they go to app.example.com. For a while, there will still be a trickle of hits to 1.2.3.4, but those should fade away once the DNS change propagates.

That’s pretty much what we do. It’s worked well for us for quite a few server moves. Obviously, since this involves messing with the configuration on production servers, you shouldn’t attempt it without fully understanding what’s involved. You should also test it out on a test setup before trying it with real applications. Plus, you should always have a fallback plan in case any step of it doesn’t work like you expect it to for some reason.

Tags: networking dns apache sysadmin