Doubt about batchSize in JSMapReduce

#1

Hi all,
I have a doubt regarding the map reduce functionality.

I saw that if I set the batchSize > 10000 , even if I have a number of elements “fetched” by my map greather than 10000, the system splits the elaboration of the map in different part. For instance:

  • I did a fetch in table PointMeasurement with I filter that returns 23000 records
  • I saw in splunk that the system split the computation in 3 different piece, usually 2 of 10000 element an 1 of 3000 element

Is the correct behaviour? Below a piece of my definition

return JS.mapReduce({
targetType: PointMeasurement,
filter: filter,
include: “quantity, start,parent”,
batchSize: 15000,
context: {
filterOfExtraction: filter
},
map: mapperStr
});

Now , I I want to do some operation on this records (for example create a copy of each) and THEN I would delete the records I think that I cannot do this operation inside the map using the same filter that I use in JSMapReduce definition, because I will erase all the records, MAYBE before that all the records are processed: right?

In this case for the delete step is correct to define the reduce phase? If yes, the reduce phase is called only one time or 3 times, one for each “sub-map”?

Thanks!
Laura

#2

The batchSize controls how many entries are processed by each queue entry. There is also a “subBatchSize”, which defaults to 10000. This controls how many entries the map function will be called with. I believe if you set that to 15000 you will see the behavior you are expecting. The documentation should have made that more clear. Can you please file a ticket regarding the documentation?

#3

Hello,
what is the correct way to set the subBatchSize? Could you give me a sample for that, please?

#4

return JS.mapReduce({
targetType: PointMeasurement,
filter: filter,
include: “quantity, start,parent”,
batchSize: 15000,
subBatchSize: 15000,
context: {
filterOfExtraction: filter
},
map: mapperStr
});

#5

Thank you very much! :grinning: