Tuesday 3 May 2011

A short story about OpenStruct and a performance issue

Once upon a time, in a office far far away, I sat all day, wondering why my report query was taking 0.54 seconds to get the data, then 49 minutes (yes, 49 minutes) to process and render it. Ok, drop the story telling now.

It was only 13000 records! I ran ruby-prof (for the first time ever) and spend a while looking at the results wondering what they meant :)

One thing I noticed was that there were quite a few bold lines in the result that were in OpenStruct (thanks for making this easy for me ruby-prof).

I concluded that calling OpenStruct in a loop was defining a brand new class and defining all the attr_accessors for each of the keys in the hash... in my reporting case, each row is converting to a hash, and each row has exactl the same keys.

I decided to run a small benchmark with OpenStruct against a class I built that lets you create a Class from a hash, which just adds all the attr_accessors then returns the type, which you can new up with the hash many times over.

class DynamicRow
  def self.with_keys(keys)
    result = Class.new
    result.class_eval do
      keys.each { |k| attr_accessor k }
      
      def initialize(hash)
        hash.each { |k, v| instance_variable_set :"@#{k}", v }
      end
    end
    result
  end
  
  def self.benchmark
    require 'benchmark'
    
    sample_hash = {:a => "hello", :b => 3}
    
    Benchmark.bm do|b|
      b.report("OpenStruct ") do
        50_000.times { OpenStruct.new(sample_hash) }
      end

      b.report("DynamicRow ") do
        row_type = DynamicRow.with_keys(sample_hash.keys)
        50_000.times { row_type.new(sample_hash) }
      end
    end
  end
end

Running the benchmark gave me some convincing results

            user     system      total        real
OpenStruct  13.400000   0.080000  13.480000 ( 13.572973)
DynamicRow   0.570000   0.010000   0.580000 (  0.584201)

Well, thats a nice improvement. But the difference there isn't going to bring the processing down that much is it? The other factor I thought might be playing a role was the number of keys, so I increased this from 2 as shown above in sample_hash to 4.

            user     system      total        real
OpenStruct  25.750000   0.160000  25.910000 ( 26.117587)
DynamicRow   0.790000   0.010000   0.800000 (  0.809647)

So, with OpenStruct, the time taken increases equal to the number of keys.. with DynamicRow it's not so bad.

My report now runs in about 1 minute, instead of most part of an hour.

Hope the experience I've gained today helps someone else :)

1 comment:

  1. Wow, nice analysis, and a good warning for users considering OpenStruct.

    ReplyDelete