A Capistrano task for a rolling Mongrel restart and deploy
At TST Media we have our rails app hosted at Engine Yard. Currently we use Nginx, haproxy, and Mongrel and have 4 slices each with 4 mongrels. When an HTTP request first comes in to our system it hits the load balancer which chooses a slice to send it to. The nginx on the given slice picks the request up and sends it onto haproxy. Haproxy chooses a mongrel to send the request to based on availability. When we roll out bug fixes, which we do once every other day or so, the Mongrels all restart at once and all the users browsing our sites experience 20-30 seconds of... basically downtime. The browser spins and waits until the mongrels are ready to go. If requests come in at a certain time the users may see a 502 Bad Gateway response or a 503 Service Unavailable response, both of which started showing up once we started using haproxy. Clearly this is unacceptable. Soon we hope to switch to Nginx with Phusion Passenger which may not have this problem. Until then we have started doing rolling restarts, where one slice is down at a time which allows us to do small deploys without impact to our users.
To accomplish this rolling restart with our setup we have to stop nginx on the slice that is down. This prevents the load balancer from sending requests to the slice that is down. If we leave nginx up and only stop the mongrels then requests will still be routed to this slice and will hang in a similar manner as if we had restarted all the mongrels at once. We put together this capistrano task:
namespace :mongrel do
desc <<-DESC
Rolling restart, 1 server at a time.
DESC
task :rolling_restart do
find_servers(:roles => :app).each do |server|
ENV['HOSTS'] = "#{server.host}:#{server.port}"
nginx.stop
puts "Sleeping 10 seconds to wait for mongrels to finish."
sleep 10
mongrel.restart
puts "Sleeping 30 seconds to wait for mongrels to start up."
sleep 30
nginx.start
end
end
end
This task iterates over each server/slice and stops nginx, waits for 10 seconds to let the mongrels finish what they are doing, restarts the mongrels, waits 30 seconds for the mongrels to boot up, and then starts nginx up again. This capistrano task assumes the existence of nginx.stop, nginx.start, and mongrel.restart tasks.
With this mongrel:rolling_restart task in place, we then defined a deploy:rolling task like this:
desc <<-DESC
A deploy without migrations where the mongrels restarted in a rolling manner.
DESC
task :rolling do
update
mongrel.rolling_restart
end
When using this deploy:rolling task our site remains up and responsive during the entire deploy. This approach is useful for small bug fix roll outs where there are no migrations that need to be ran. There is a short window of time in which some of your servers will be out-of-date. For example you may see issues if your bug fix includes changes to a view file and a controller, and say a user hits a mongrel and is served the new view and then makes a post to an out-of-date mongrel with the new controller. However this is usually preferred to forcing all of your users to wait 30 seconds while all the mongrels restart. I would rather impact a very small percentage of our users than 100% of our users.