Batch Processing In Grails
In one of my project assignments I needed to insert large number of records into the database. I had to read the objects from an external source. Once I read all of the objects into a List, I iterated the list to save each one of them individually. In the beginning the process carried on fine but as the time passed the execution slowed down considerably. It almost took one second to insert one object into the database. Imagine the time it would have taken to insert 50000 records with this pace. Besides, many times it threw OutOfMemoryException. The code I had written did something like this :
(0..60000).each{
Person person = new Person(.....)
person.save()
}
One of the solutions that I found was to use transactions and save the objects in batches, each transaction saving a batch of objects. It worked well and reduced the execution time considerably. What I did was like this :
def startTime = System.nanoTime()
List batch =[]
(0..50000).each{
Person person= new Person(....)
batch.add(person)
println "Created:::::"+it
if(batch.size()>1000){
Person.withTransaction{
for(Person p in batch){
p.save()
}
}
}
batch.clear()
session = sessionFactory.getCurrentSession()
session.clear()
}
def endTime = System.nanoTime()
def diff = (startTime-endTime)/1000000000
println "TIME TAKEN IS :::"+diff
In the previous case the time take to save 50,000 records was around 500 seconds. But, here time taken to save the same number of records came out to be just 80 seconds.
But there is one flaw in the method. If the objects are bulky, even this method would not work. Each action in a Grails Controller is executed within a Hibernate Session. The session is started right before the action starts and is closed once it returns. Thus Hibernate caches all the newly inserted Person instances in the session-level cache. As the number of objects grows, the session becomes bulkier, which slows down whole process .That also explains the reason for the memory issue, OutOfMemoryException, because all the objects are being cached to the Hibernate session.The solution to this problem is to clear the session regularly so as to keep it light throughout the process. All that needs to be done is to get hold of the current session and clear it after each batch has been written to the database. To do this just inject SessionFactory object into your controller, get the current session object and then clear this current session.
def startTime = System.nanoTime()
List batch =[]
(0..50000).each{
Person person= new Person(....)
batch.add(person)
println "Created:::::"+it
if(batch.size()>1000){
Person.withTransaction{
for(Person p in batch){
p.save()
}
}
batch.clear()
}
session = sessionFactory.getCurrentSession()
session.clear()
}
def endTime = System.nanoTime()
def diff = (startTime-endTime)/1000000000
println "TIME TAKEN IS :::"+diff
Thank you,
Imran Mir,
imran@intelligrape.com
Quite helpful. Thanks for posting 🙂
Hi,
If you have 50010 Person, you may not save the rest 10 Person. How to handle it?
Hi Srinath,
You need to inject sessionFactory bean into the artefact using “def sessionFactory”
— Vivek
Hi,
How to get sessionFactory .
I was getting below issue
groovy.lang.MissingPropertyException: No such property: sessionFactory for class:
do i need to import any jars?
thanks.
Absolutely right..thanks for pointing out the mistake. I ‘ve updated the blog.
cheers Imran
Hi,
your code above is clearing the Hibernate session after every Person is created, shouldn’t it be inside the if(batch.size()>1000)… block?
cheers
Lee