Upgrading to Rails 2.3.2 from Rails 2.1
We recently upgraded our Rails app at TST Media from Rails 2.1 to Rails 2.3.2. It was a pain in the ass! A little background on our application first. We have 36,600 lines of code, 322 models, 108 controllers, and a relatively weak test suite. Additionally we upgraded the majority of our gems and plugins, of which we have 32 gems and 16 plugins in our vendor directory. Thats 48 third-party open source ruby libraries our application depends on! Crazy. The application originated back in August 2006 when it was first running on Rails 1.1.2.
git rm -r vendor/rails
git ci -a -m "removing rails to prepare for upgrade"
rake rails:freeze:edge RELEASE=2.3.2
Here are the issues we ran into during the upgrade, many of which were not documented in the release notes:
1. In Rails 2.1 and earlier, when using check_box_tag the convention is to put a hidden_field_tag after it with the same name and a value of 0 if you need there to always be a name/value pair sent for the check box. The check_box form helper did just this. The browser would then send the check_box_tag value if it is checked, otherwise it would send the hidden_field_tag value. Well, for some reason, in Rails 2.3 the order is swapped. The hidden_field_tag must come before the check_box_tag. The Rails 2.3.2 check_box form helper swapped the order as well.
2. The form helper methods check_box_tag, check_box, select_tag, select, text_field_tag, text_field,(and others?) changed how they set the id of the form field when the name of the form field includes square brackets. Ex. select_tag('user[gender]') creates a select form field with a name of 'user[gender]' and an id of 'user_gender'. In Rails 2.1 and earlier the id of the select tag would be 'user[gender]'. I have no idea why this change was made in Rails, but it was a huge pain in the ass and I did not see this change documented anywhere! It broke all sorts of custom javascript which depended on the id of the form fields. Dozens of observe_field statements no longer worked as well since they were observing form fields that did not exist. And worse yet, for some reason text_area and text_area_tag did not change, so text_area_tag('post[description]') results in a text area with an idea of 'post[description]'. This inconsistentcy is unfortunate.
3. The behavior of observe_field when observing checkboxes has changed. In Rails 2.1 the event would be fired when checking or unchecking a checkox, whereas with Rails 2.3.2 the event is only fired when checking the checkbox. I assume this change can be attributed to the upgrade of the prototype javascript library. Usually when we are observing a checkbox we are simply calling a javascript function when the event is fired, as opposed to doing an ajax request, and so the fix is to simply not use observe_field and replace it with the standard onclick= within the HTML. Unobtrusive javascript be damned! I've seen so many issues with observing events on checkboxes, radios, and compatibility across browsers. It just isn't worth the bother. Going with the standard onclick="myFunction()" always works.
4. We ran into a weird issue in development mode when using render :file => 'mytheme', which we were doing to serve up dynamically generated css files. All of our theme css files are served through a single action, and the name of the file to render depends on the theme. The problem is that when changing 'mytheme' in development mode and refreshing the browser the changes do not come through until restarting the server. Not a big deal for me, but it was driving our designer crazy! So somewhere Rails is caching anything that goes through render :file. On initial inspection we realized that ETags were to blame. The ETag header is automatically set by Rails 2.3.2 when using render :file and a 304 not modified response is sent after the first time the file is requested. Modifying the file does not cause the ETag to go stale. Setting request.headers['ETag'] = nil causes the ETag header to not be sent, which we thought would fix the problem. It did not. Rails is still caching the contents of the file somewhere else! Instead of digging into this further, we switched away from using render :file and used render :action instead. In production mode we actually have Nginx serve up a cached version of the theme css files which is why we had no need of caching support built into the Rails app for this request.
Also while inspecting this issue I ran across the new configuration parameter, cache_template_loading which if set to true caches view files so they are not read from disk more than once. By default this is turned off in development mode, and further this has no affect on render :file anyway.
5. Previously we did not have our gems specified in environment.rb using config.gem. Due to an annoying gem spec warning we had to specify all our gems in environment.rb, which we should've done earlier with Rails 2.1 anyway. This is the gem spec warning:
config.gem: Unpacked gem ym4r-0.6.1 in vendor/gems has no specification file. Run 'rake gems:refresh_specs' to fix this.
Running rake gems:refresh_specs does not fix this unless the gem is specified in the environment.rb file using a config.gem line such as this:
config.gem "ym4r"
5.5. A bug was introduced somewhere between Rails 2.1 and 2.3 dealing with the serialization of integers, booleans, and floats. In Rails 2.1 and before, a serialized integer resulted in an integer when the attribute is unmarshaled. In Rails 2.3 serializing an integer results in a string, which is clearly not the desired behavior! See lighthouse ticket https://rails.lighthouseapp.com/projects/8994/tickets/1379-serialized-fixnum-returns-a-string-after-save
6. In test_helper.rb we renamed the class to ActiveSupport::TestCase
7. Installed SystemTimer gem and upgraded memcache-client to 1.7.2. Brought memcache_util.rb into lib directory as it was removed from memcached_client 1.7.2.
8. Had to add this cache direcotry since it was ignored by .gitignore file
git add -f vendor/rails/activesupport/lib/active_support/cache
9. Changed a few units tests to extend from ActiveSupport::TestCase instead of Test::Unit::TestCase
10. Changed the session_key for cookies to be set in the environment file.
11. Changed the image_path rails patch use of relative_url_root to: ActionController::Base.relative_url_root
12. Changed assert_redirected_to to be less specific with a partial match
13. Change deprecated session.session_id to request.session_options[:id]
14. Moved views/store/store_mailer/_result_details.html.erb to _result_details.rhtml..... Rails 2.3.2 can't handle partials for mailers with .html.erb unless you specify the partial extension directly.
15. Removed binding argument from concat call since it was deprecated.
16. Remove my own rails patch for caching of HABTM columns since it has been incorporated into rails 2.3.2
Capistrano Tip to Avoid Disk Intensive Removal of Files
At TST Media we host our Rails app at Engine Yard on four slices which all utilize the same shared disk via gfs. Anytime there is any hard core disk activity our sites slow down to a crawl. This makes removing files a bit tricky, such as when we want to empty our cache files or when Capistrano removes a release at the end of a deploy. The strategy we have come up with to handle this is to move the files to a specific directory we call the "caches_to_remove" directory, and use a cron task to empty this directory at night when most of our users are asleep. Moving files is extremely fast and not disk intensive, as long as the source and destination is on the same disk of course.
The shell script that the cron runs nightly is simple:
# remove_cache_dirs.sh
rm -rf /data/tst/caches_to_remove
mkdir /data/tst/caches_to_remove
We have a capistrano task to empty the cache files, which simply moves the cache directory into the caches_to_remove directory.
set :remove_files_dir, '/data/tst/caches_to_remove/'
namespace :cache do
desc "Delete all cache files on disk."
task :empty, :roles => :web do
sudo "mv /data/tst/cache #{remove_files_dir} && mkdir /data/tst/cache"
end
end
We also want a deploy to do a mv instead of a rm -rf on the release to be "removed". Our application has quite a few files and some 33,000 lines of code not counting all the plugins, gems and Rails itself which are vendored, so doing a rm -rf at the end of deploy considerably slows our app down for several minutes. So we overrode the built-in capistrano deploy:cleanup task to accomplish this.
namespace(:deploy) do
# overriding cleanup task, changing it from a rm -rf to a mv
desc <<-DESC
Clean up old releases. By default, the last 5 releases are kept on each \
server (though you can change this with the keep_releases variable). All \
other deployed revisions are removed from the servers. By default, this \
will use sudo to clean up the old releases, but if sudo is not available \
for your environment, set the :use_sudo variable to false instead.
DESC
task :cleanup, :except => { :no_release => true } do
count = fetch(:keep_releases, 5).to_i
if count >= releases.length
logger.important "no old releases to clean up"
else
logger.info "keeping #{count} of #{releases.length} deployed releases"
# COMMENT OUT THIS CODE
# directories = (releases - releases.last(count)).map { |release|
# File.join(releases_path, release) }.join(" ")
# invoke_command "rm -rf #{directories}", :via => run_method
# ADD THIS CODE
directories = (releases - releases.last(count)).each do |release|
directory = File.join(releases_path, release)
invoke_command "mv #{directory} #{remove_files_dir}", :via => run_method
end
end
end
end
And the world is a happier place!
A Capistrano task for a rolling Mongrel restart and deploy
At TST Media we have our rails app hosted at Engine Yard. Currently we use Nginx, haproxy, and Mongrel and have 4 slices each with 4 mongrels. When an HTTP request first comes in to our system it hits the load balancer which chooses a slice to send it to. The nginx on the given slice picks the request up and sends it onto haproxy. Haproxy chooses a mongrel to send the request to based on availability. When we roll out bug fixes, which we do once every other day or so, the Mongrels all restart at once and all the users browsing our sites experience 20-30 seconds of... basically downtime. The browser spins and waits until the mongrels are ready to go. If requests come in at a certain time the users may see a 502 Bad Gateway response or a 503 Service Unavailable response, both of which started showing up once we started using haproxy. Clearly this is unacceptable. Soon we hope to switch to Nginx with Phusion Passenger which may not have this problem. Until then we have started doing rolling restarts, where one slice is down at a time which allows us to do small deploys without impact to our users.
To accomplish this rolling restart with our setup we have to stop nginx on the slice that is down. This prevents the load balancer from sending requests to the slice that is down. If we leave nginx up and only stop the mongrels then requests will still be routed to this slice and will hang in a similar manner as if we had restarted all the mongrels at once. We put together this capistrano task:
namespace :mongrel do
desc <<-DESC
Rolling restart, 1 server at a time.
DESC
task :rolling_restart do
find_servers(:roles => :app).each do |server|
ENV['HOSTS'] = "#{server.host}:#{server.port}"
nginx.stop
puts "Sleeping 10 seconds to wait for mongrels to finish."
sleep 10
mongrel.restart
puts "Sleeping 30 seconds to wait for mongrels to start up."
sleep 30
nginx.start
end
end
end
This task iterates over each server/slice and stops nginx, waits for 10 seconds to let the mongrels finish what they are doing, restarts the mongrels, waits 30 seconds for the mongrels to boot up, and then starts nginx up again. This capistrano task assumes the existence of nginx.stop, nginx.start, and mongrel.restart tasks.
With this mongrel:rolling_restart task in place, we then defined a deploy:rolling task like this:
desc <<-DESC
A deploy without migrations where the mongrels restarted in a rolling manner.
DESC
task :rolling do
update
mongrel.rolling_restart
end
When using this deploy:rolling task our site remains up and responsive during the entire deploy. This approach is useful for small bug fix roll outs where there are no migrations that need to be ran. There is a short window of time in which some of your servers will be out-of-date. For example you may see issues if your bug fix includes changes to a view file and a controller, and say a user hits a mongrel and is served the new view and then makes a post to an out-of-date mongrel with the new controller. However this is usually preferred to forcing all of your users to wait 30 seconds while all the mongrels restart. I would rather impact a very small percentage of our users than 100% of our users.
RailsConf 2009 and the Danger of Remote Mob Mentality
My first Ruby on Rails Conference was a positive experience. RailsConf was in Vegas this year, and while I didn't win any money gambling, I did see several good talks and met some interesting Rails developers.
During the Wednesday morning keynote, as Chad Fowler was introducing Chris Wanstrath of Github, he asked who uses Git. Basically everyone in the room raised their hand. He went on to say that Rails programmers are like lemmings, which I think is a very interesting observation. It wasn't too long ago that most Rails developers used Subversion, and as soon as the Rails core team switched to Git everyone followed. It wasn't too long ago that test-driven development was an obscure programming practice only used by "Extreme" programmers. Now, if you are working on a Rails project it is a given that you have a decent test suite. And don't forget about Rest architecture.... people love Rest architecture.
After Timothy Ferriss's disappointing keynote Tuesday night, which served to entertain as the source of many jokes throughout the remainder of the conference, everyone was ready for a real hardcore motivational speech. Wow did Robert Martin deliver in his talk, "What Killed Smalltalk Could Kill Ruby Too." No slides, just Robert Martin pacing on the stage and flinging his note-cards into the air when he was done with them. Being a great speaker, he had everyones rapt attention. He recapped a short history of Smalltalk and why it "died", and outlined what the Ruby and Rails community can do to avoid the same fate. This included doing test-driven development, professionalism, not being arrogant towards non Ruby programmers, and the development of more powerful Ruby Integrated Development Environments. He stressed test-driven development quite a bit, as I knew he would given his Extreme programming background. When the speech finished the crowd gave him a standing ovation. Everyone loved it.
At RailsConf it was apparent to me that Rails developers are a young crowd. I knew this before the conference, but seeing 1300 Ruby on Rails nerds all in the same room made it even more obvious. An analogy to lemmings is clearly extreme, but certainly Rails developers are impressionistic. There definitely seems to be a sort of remote mob mentality thing going on, which is a little disturbing. You know those Simpsons episodes where the towns people group together in a mob and everyone wants to kill Bart. Then someones yells some other new purpose and the mob follows without thinking. Anyway, the point is that I'd like to see Rails Developers and other programmers think more for themselves. Everyone's circumstances and project is different, and pretending that there are a few programming practices such as test-driven development that absolutely must be done to succeed as Robert Martin implied is absurd. I would add "Think for Yourself" to Robert Martin's list of what the Rails community must do to avoid the fate of Smalltalk.
At TST Media we spend very little time writing tests and have a weak test suite. Our lines of code comes out at 33435, and our test lines of code is 1811, a test to code ratio of 0.05. While I would like to see this improved marginally, given our current situation it is simply not worth trading features for a slightly higher quality code base, which is what a better test suite would give us.
Running ar_sendmail with monit
Sending email from a web application, especially blast emails to a lot of people, can take a lot of time. Generally you don't want the user to wait until all the emails have been handed off to the smtp server. You also probably don't want to tie up an entire mongrel with sending mail. The ar_mailer gem solves this problem in excellent fashion, by saving pending emails to the database and having a separate ruby daemon process periodically check the database and send emails. I recently set up one of our rails apps at work to use ar_mailer. Configuring it to use ar_mailer was incredibly easy, but it was tricky to get the ar_sendmail ruby daemon process to run under monit. On our production servers which we have hosted at Engine Yard, we want every process that our application depends on to be monitored by monit.
The primary feature that ar_sendmail lacks to play nice with monit is the ability to leave a pid file after it starts up and to remove it when the process exits. This has already been pointed out on rubyforge as a feature request. Here is what I did to get ar_sendmail working under monit: (ar_mailer 1.3.1)
0) Find the file ar_sendmail.rb on your system and open it for editing.
For me this was at /usr/local/lib/ruby/gems/1.8/gems/ar_mailer-1.3.1/lib/action_mailer/. Another common location is /usr/lib/ruby/gems/1.8/gems/ar_mailer-1.3.1/lib/action_mailer/
1) Create a class variable to store the pid file path in and a method to remove the pid file.
I put this just below "attr_accessor :failed_auth_count"
@@pid_file = nil
def self.remove_pid_file
FileUtils.rm(@@pid_file) if @@pid_file
end
2) Modify the self.run method in ar_sendmail.rb to create a pid file, and to not start up if an ar_sendmail process is already running
if options[:Daemon] then
require 'webrick/server'
@@pid_file = "#{options[:Chdir]}/log/ar_mailer.pid"
if File.exists? @@pid_file
# check to see if process is actually running
pid = ''
File.open(@@pid_file, 'r') {|f| pid = f.read.chomp }
if system("ps -p #{pid} | grep #{pid}") # returns true if process is running, o.w. false
$stderr.puts "Warning: The pid file #{@@pid_file} exists and ar_sendmail is running. Shutting down."
exit
else
# not running, so remove existing pid file and continue
self.remove_pid_file
log "ar_sendmail is not running. Removing existing pid file and starting up..."
end
end
WEBrick::Daemon.start
File.open(@@pid_file, 'w') {|f| f.write("#{Process.pid}\n")}
end
If the pid file already exists, this code will check to see if the process is actually running. If it is then it will exit, otherwise it will remove the file and continue. This is useful for when the process dies ungracefully somehow (server crashes, killed with -9, .etc), in which case it will leave a pid file behind. Note that this differs from the code suggested on the rubyforge feature request, not only in that it checks the existence of the pid file, but it references options[:Chdir] instead of Dir.pwd in order to be compatible with the -c and --chdir ar_sendmail option.
3) Modify the do_exit method in ar_sendmail.rb to remove the pid file
def do_exit
log "caught signal, shutting down and removing pid file"
self.remove_pid_file
exit
end
4) Create the monit file /etc/monit.d/ar_sendmail.teamsport.monitrc
check process ar_sendmail_teamsport with pidfile /data/teamsport/current/log/ar_mailer.pid start program = "/usr/bin/ar_sendmail -d -e production -c /data/teamsport/current/" as uid teamsport and gid teamsport stop program = "/usr/local/bin/stop_ar_sendmail" as uid teamsport and gid teamsport if totalmem is greater than 65.0 MB for 2 cycles then restart # eating up memory? if loadavg(5min) greater than 10 for 8 cycles then restart # bad, bad, bad if 20 restarts within 20 cycles then timeout # something is wrong, call the sys-admin group ar_sendmail
Note that I am using a simple shell script script called stop_ar_sendmail to stop the ar_sendmail process. ar_sendmail has signal handlers for SIGINT and SIGTERM so we should use these signals to kill it, which will cause the do_exit method to be triggered. The stop_ar_sendmail script looks like this:
#!/bin/bash kill -2 `cat /data/teamsport/current/log/ar_mailer.pid`
Originally I tried putting the contents of this shell script in the .montrc file like this:
stop program = "/bin/kill -2 `cat /data/teamsport/current/log/ar_mailer.pid`" as uid teamsport and gid teamsport
This however does not work, since apparently monit doesn't know what to do with the backticks. Alternatively you could use some sort of grep and kill script, such as pkill, to stop the ar_sendmail process. Ideally in the future, ar_sendmail will support some sort of stop command in the same manner as mongrel so that you could run "ar_sendmail stop" to stop it.
Once you have this all setup you can control ar_sendmail via monit. When creating the ar_sendmail.teamsport.monitrc file, make sure you change "teamsport" to the user that you want to run the process as. Then do a "sudo monit reload" and monit should see that ar_sendmail is not running and will start it for you. To make sure everything is working correctly try "sudo monit stop ar_sendmail_teamsport" and "sudo monit start ar_sendmail_teamsport" (replacing "teamsport" with the appropriate user name).
Other than not working well with monit out of the box, the only other issue I have with ar_sendmail is the memory footprint, which is dependent to some degree on your rails app. The ar_sendmail process for my app runs at around 50 MB, and just to send mail! I assume this is due to the fact that ar_sendmail loads the rails app's environment.rb file. The environment.rb file runs the boot.rb file, which bootstraps and initializes the entire rails app. Additionally, our environment.rb has several other plugins required inside of it. I think the environment.rb file is loaded primarily just to get at the ActiveMailer smtp_settings, which is a slick way to allow for easy integration of ar_mailer with minimal changes to your existing rails app. Many people wouldn't think twice about 50 MB, but rails hosts charge quite a bit for RAM. I can definitely envision a slimmed down ar_sendmail that doesn't load the rails app's environment.rb file, but it seems almost impossible to do this without making integration with existing rails apps more difficult.