How do I run a large number of fetch calls in parallel using JSMapReduce?


#1

I need to perform a large number (say thousands) of fetch calls with different filters on the same srcType to create a data set and train a Machine Learning model on top of that. I can do this in python with a for loop, but that takes too long. How can I use the JSMapReduce API to perform these fetch calls in parallel?


#2

You can define a JSMapReduceSpec, and pass a map function to it. You can also leverage the all purpose type JSData to store the results of the fetches done in MR way. Here is an example:

You can initialize instances of the generic type JSData with id being the experimentName appended by each filter from the list of filters stored in filter_array. experimentName will later be used for easy querying of the results. This can be done using the script below from Jupyter Notebook or a similar one from workbench:

JSDataObjs = []
for f in filter_array:
    JSDataObjs.append({"id":"myExperiment_"+f, "data":None})
c3.JSData.upsertBatch(JSDataObjs)

you can check that these objects are created on type JSData using the following fetch call:

c3.grid(c3.JSData.fetch(filter="contains(id, 'myExperiment_')"))

You can now define the JSMapReduceSpec which defines the MapReduce job to be performed.

spec = c3.JSMapReduceSpec(
        targetType = {'typeName':"JSData"},
        include= "id",
        filter="contains(id,'myExperiment_')",
        limit= -1, 
        batchSize= 100,
        map = "function map(batch, objs, job) { if(objs && objs.size()>0) {var ids = objs.pluck('id'); for (var i = 0; i < ids.length; i++) {var gnc = GridNodeCriticality.fetch({filter:   ids[i].split('Experiment_')[1], limit:-1, include: 'criticalityCode,level,nodeType,qualityOfServiceImpact'}); JSData.upsert({'id':  ids[i], 'data':gnc }); }}}"
       )

As an example, this map function picks the ids of the JSData in each batch, extracts the filter from the id, uses the filters in a fetch call, stores the fetch results in a variable called gnc, and upserts the results back into the JSData type corresponding to the right id.

Now you can kick off the MR job using:

c3.JS.mapReduce(spec)

results from your fetch calls will be then upserted into the JSData type which you can later retrieve with 1 fetch call such as:

result = c3.grid(c3.JSData.fetch(filter="contains(id, 'myExperiment_')", include:"id, data"))

After some hopefully quick parsing of the result object, Your dataset will be ready to be used for Machine Learning training.