Mad Marmot A blog about programming, ruby, rails.

Rails patch for caching ‘SHOW FIELDS’ for has_and_belongs_to_many associations

Posted on January 8, 2009

Last week I was examining the MySQL slow query logs at work and discovered the following which led to an easy Rails patch which improved the performance of our app by about 25%.

# Time: 090108 11:05:02
# Query_time: 14.412306  Lock_time: 0.000521  Rows_sent: 2  Rows_examined: 2
SHOW FIELDS FROM `events_page_nodes`;
# Query_time: 14.390774  Lock_time: 0.000556  Rows_sent: 2  Rows_examined: 2
SHOW FIELDS FROM `events_page_nodes`;

Normally 'SHOW FIELDS' queries are moderately fast. I ran it manually just now and it took 0.16 seconds. However here you can see that these 'SHOW FIELDS' queries took 14 seconds to complete! Turns out that MySQL creates a temporary table on disk for 'SHOW FIELDS' queries, so if the disk is busy with something else these queries can take awhile to complete as seen here.

In development mode these 'SHOW FIELDS' queries are not cached and occur very frequently, but in production mode Rails caches these queries the first time they are called for each model. I noticed that our database was receiving a large number of these 'SHOW FIELDS' queries, which I thought should only occur when a Rails environment is loaded or shortly thereafter when the models are first loaded. (ex. mongrel restarts, a background job, or a cron job).   
                                                 
However, upon inspection it turns out that Rails DOES NOT cache 'SHOW FIELDS' queries for has_and_belongs_to_many associations. So every time a select or an insert is done via a Rails has_and_belongs_to_many association, a 'SHOW FIELDS' on the join table is executed. One way to solve this problem would be to switch to using the has_many :through approach, which involves adding a primary key id column to the join table and creating an ActiveRecord model for it, which would then take advantage of the built-in Rails caching of 'SHOW FIELDS'. However we have 20 some join tables in our application. So instead I patched Rails to cache the 'SHOW FIELDS' queries, which turned out to be rather simple and noticeably impacted the performance of our app (see charts below).                                                 

The 'SHOW FIELDS' queries are wrapped inside the database connection's columns method, and columns is called from insert_record and finding_with_ambiguous_select? in the has_and_belongs_to_many_association.rb file. In this patch I replace @owner.connection.columns with a call to @owner.habtm_columns instead, which has the caching logic within it. The cache is a hash with the table_name as the key and the result of connection.columns as the value. Since there isn't a class associated with the join table, I simply store the cache within one of the two classes that are part of the association. So it is possible that 'SHOW FIELDS' will be called twice and cached twice for a single join table, once for each class that is a part of the association.

Here are the changes I made to Rails for this patch.

1. In insert_record of has_and_belongs_to_many_association.rb replace this line:
  columns = @owner.connection.columns(@reflection.options[:join_table], "#{@reflection.options[:join_table]} Columns")
  # with this line:
  columns = @owner.class.habtm_columns(@reflection.options[:join_table], "#{@reflection.options[:join_table]} Columns")

2. Change the method finding_with_ambiguous_select? of has_and_belongs_to_many_association.rb replace to be:
  def finding_with_ambiguous_select?(select_clause)
    !select_clause && @owner.class.habtm_columns(@reflection.options[:join_table], "Join Table Columns").size != 2
  end

3. In base.rb define this method as a class method:
  def habtm_columns(table_name, name)
    @habtm_columns_hash = {} unless defined?(@habtm_columns_hash) && @habtm_columns_hash
    unless @habtm_columns_hash.has_key? table_name
      @habtm_columns_hash[table_name] = connection.columns(table_name, name)
    end
    @habtm_columns_hash[table_name]
  end
 
4. And in reset_column_information be sure to clear out @habtm_columns_hash by adding it to the list of variables to nil out:
  def reset_column_information
    generated_methods.each { |name| undef_method(name) }
    @column_names = @columns = @columns_hash = @content_columns = @dynamic_methods_hash = @generated_methods = @inheritance_column = @habtm_columns_hash = nil
  end  

Here is the complete patch for Rails 2.1.1.

Here is the complete patch for Rails 2.0.2.

After rolling this change out we have seen a very noticeable improvement on performance. By using New Relic's Compare With Yesterday feature (see chart below) we can see that our application response time dropped from an average of about 560 ms per request to 420 ms per request. This is roughly a 25% performance increase. The yellow line shows today, the blue line is yesterday. CPU and database load decreased by about 25% as well. This patch was rolled out at 13:30 yesterday, which is why the response time, CPU, and database graphs converge around 13:30. Throughput is basically the same across both days. (Note that the big dip in throughput, CPU, and database right around 13:30 is inaccurate. When the mongrels restarted after rolling this change out, several mongrels failed to communicate with New Relic for some unknown reason. Restarting these mongrels seemed to fix the communication problem with New Relic, which hasn't occured since.)

compare_with_yesterday

Comments (2) Trackbacks (0)
  1. After submitting this patch to the Rails core team, upon suggestion I modified the patch to store the cache within the reflection instance of the association. This is a better solution since it avoids the possibility of caching the same info in 2 places (the 2 owners of the habtm association). This patch allows one to call columns and reset_column_information directly from the habtm association: developer = Developer.find 1 developer.projects.columns developer.projects.reset_column_information See the updated patch at http://rails.lighthouseapp.com/projects/8994-ruby-on-rails/tickets/1738-caching-columns-for-has_and_belongs_to_many#ticket-1738-6

  2. This patch was first included with Rails 2.3.


Leave a comment


No trackbacks yet.