GORM : batch importing large datasets and a performance benchmarking app

Summary

We have a new client that will be importing large payment data files into out application and it appears that this topic (Zach) has some common mind share right now. Over the years I have seem many recommendations for hibernate performance tuning with large sets of data and most recently for GORM as well. What I haven't seen are any sample straight forward benchmark apps to fiddle with. So I jammed the beginnings of one together and here it is.

Zach posted a link to a project he was having problems with using GPars to do a bulk data import with GPars and I ran with that as a good base.

GPars

The GPars is the Groovy way to do parallel processing so you can take advantage of all those cores for longer running areas of your code. Its theoretically perfect for splitting up batch process to take advantage of validation and processing in your cores will keeping the statments firing into your database.

Key conclusions

The 4 key factors I have discovered so far to really speed things up are..

Use GPars so you are not firing on the proverbial 1 cylinder.
follow the concepts in the hibernate suggestions here in http://docs.jboss.org/hibernate/core/3.3/reference/en/html/batch.html for chaps 13.1 and 13.2 and set your jdbc.batch_size then go to Ted's article here http://naleid.com/blog/2009/10/01/batch-import-performance-with-grails-and-mysql/
use small transaction batches and keep them the same size as the jdbc.batch_size. DO NOT (auto)commit on every insert
Don't use GORM data binding if you can avoid it.

DON"T do this -> new SomeGormClass(yourPropSettings) or someGormInstance.properties = [name:'jim',color:'red'] or use bindData()
DO explicitly set the values on the fields or your gorm object -> someGormInstance.name = 'jim' ...etc

My Bench Mark Results and details

LoaderService has the different benchmarks to run and its called form Bootstrap using when I do run-war

115k CSV records on a macbook pro 2.8 dual core. 1024mb ram was given to the VM and these were run using prod run-war
I'm using MySql as the DB and its installed on my mac too so GPars can't really get all the cores
all of these have jdbc.batch_size = 50 and use the principals from #2 above and flush/clear every 50 rows
The test where the batch insert happen in a single transaction can't be tested with GPars since a single transaction can't span multiple threads
Databinding is NOT used on the first 3 tests,except for the last test. You can see that using Databinding pretty much doubles the time for both the normal way and using GPars
the winner seems to be gpars and batched (smaller chunks) transactions

	Normal	with GPars
Single transaction	91 s	Not applicable

Commit Tran (auto) each record	298 s	98 s

Batched Transactions	92 s	49 s
Commit every 50 records

Batch Tran every 50	190 s	94 s
Using Databinding

Spring's SimpleJdbc	37 s	27 s

TODOs

My hunch is that GPars difference will be even more significant on a quad with HT. Could use some help here

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
grails-app		grails-app
resources		resources
src/groovy/gpbench		src/groovy/gpbench
web-app/WEB-INF		web-app/WEB-INF
wrapper		wrapper
.gitignore		.gitignore
README.md		README.md
application.properties		application.properties
gptest-grailsPlugins.iml		gptest-grailsPlugins.iml
gptest.iml		gptest.iml
grailsw		grailsw
grailsw.bat		grailsw.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GORM : batch importing large datasets and a performance benchmarking app

Summary

GPars

Key conclusions

My Bench Mark Results and details

TODOs

More background and reading

About

Releases

Packages

Contributors 2

Languages

basejump/grails-gpars-batch-load-benchmark

Folders and files

Latest commit

History

Repository files navigation

GORM : batch importing large datasets and a performance benchmarking app

Summary

GPars

Key conclusions

My Bench Mark Results and details

TODOs

More background and reading

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages