What is the significance of commitsize in Canonical

#1

When I have a canonical with this in the definition header, what is the purpose of commitsize?
@dataLoad(sequential=false,
numErrorsToAbort=-1,
commitSize=100,
chunkSize=5000000,
numRetries=10)

I understand that chunksize is for the data loading, in terms of how large to make each chunk (10,000 by default), but can I conclude then that the rows are committed every 100 rows in the example above?

How / how much would an increase in this value affect things? Is it worth tweaking as a performance parameter?

0 Likes

#2

Yes you can conclude the rows are committed every 100 rows in the example provided.
I would assume that chunk size is super huge and you might choke JMS for such a small commit size, and when we are doing IO we do not want to throttle the database.
chunkSize is for dividing the input message into various chunks to process them in parallel.
commitSize is used internally to further divide the chunks into much more smaller pieces for transforming and merge to target db.
You can check the performance stats in splunk by looking at action profiler and search for importData (if you are using the SourceFile syncFile api) to see the performance of createBatch and mergeBatch actions.

0 Likes