Boling For Batches v1.0.2
Plugin for Rails
I often need to execute really large computations on really large data sets. I usually end up writing a rake task to do it, which calls methods in my models. But something about the process bugged me. Each time I had to re-implement my 'batching code' that allowed me to not chew up GB after GB of memory due to klass.find(:all, :include => [:everything_under_the_sun]). Re-implementation of the same logic over and over across many projects is not very DRY, so I got out my blow torch and lit it up. The difficulty was that the part that was different each time I batched was at the center of the code, right in the middle of the batch loop. But I didn't let that stop me!
Install
SVN:
svn checkout http://boling-for-batches.googlecode.com/svn/trunk/boling_for_batches
Rails Plugin:
./script/plugin install http://boling-for-batches.googlecode.com/svn/trunk/boling_for_batches
Example
#Setup your new batch, and tell it what options to use, and what class to run batches of batch = BolingForBatches::Batch.new(:klass => Payment, :select => "DISTINCT transaction_id", :batch_size => 50, :order => 'transaction_id') #Run a specific instance method on each record in each batch, and send the rest of the params to that method. batch.run(:remove_duplicates, false, true, true) #Print the results! batch.print_results
Configuration
Options for the initializer (Batch.new) method are:
:klass - Usage: :klass => MyClass
Required, as this is the class that will be batched
:include - Usage: :include => [:assoc]
Optional
:select - Usage: :select => "DISTINCT field_name"
or
:select => "field1, field2, field3"
:order - Usage: :order => "field DESC"
:conditions - Usage: :conditions => ["field1 is not null and field2 = ?", x]
:verbose - Usage: :verbose => 'yes' or 'no'
Sets verbosity of output
Default: yes (if not provided)
:batch_size - Usage: :batch_size => x
Where x is some number.
How many AR Objects should be processed at once?
Default: 50 (if not provided)
:last_batch - Usage: :last_batch => x
Where x is some number.
Only process up to and including batch #x.
Batch numbers start at 0 for the first batch.
Default: won't be used (no limit if not provided)
:first_batch - Usage: first_batch => x
Where x is some number.
Begin processing batches beginning at batch #x.
Batch numbers start at 0 for the first batch.
Default: won't be used (no offset if not provided)
Output
Interpreting the output:
- [O] means the batch was skipped due to an offset.
- [L] means the batch was skipped due to a limit.
- [P] means the batch is processing.
- [C] means the batch is complete.
- and yes... it was a coincidence. This class is not affiliated with 'one laptop per child'
License
Copyright (c) 2008 Peter H. Boling, released under the MIT license
Or in other words have fun, and don't blame me!