Mapreduce not running all jobs?

#1

I have the following mapreduce:

var map = function(batch, objs, job) {
  objs.each(function(pms) {
     Location.updateGeocoding(pms,false);
  });
}
 
var spec = JSMapReduceSpec.make({
    targetType: Location,
    include: "id",
    filter: "!exists(address.geometry.latitude) && !exists(address.geometry.longitude)",
    limit: -1,
    batchSize: 10,
    map: map
});
 
JS.mapReduce(spec);

I confirmed with c3Count with this filter that there exists ~11000 Locations, but every time I run this, the mapreduce queue fills and empties within seconds, and only processes a few Locations. Running it again will do the same thing. I’m not sure how to debug this.

0 Likes

#2

Sounds like Location.updateGeocoding might be calling an external API that enforces a rate limit. Your JSMapReduce job is probably running more concurrent batches than that rate limit can handle. The best way to debug this would be to check Splunk for failed API calls, and if this is the case, reduce the max concurrency of your job. See JSMapReduceSpec.maxConcurrency.

Also FYI, there are methods that already implement the functionality you need. See:

  • Location.updateGeocodingAll (asynchronous, takes a filter as an argument)
  • Location.updateGeocodingBatch (synchronous, takes an array of Location objects as an argument)
0 Likes