Usage of fetchObjStream in Jupyter notebook

Can I use fetchObjStream in Jupyter notebook?

I want to apply a Python function to all the instances of a type. I cannot fetch all the data (too many instances), but I can keep in the container memory the result of the function applied to all the instances (it’s a sparse matrix).

In general is there a streamlined way of fetching data in batches from Jupyter notebook without using custom filters on IDs?

Thanks,
Alessandro

Hi Alessandro,

As you indicated, the best option would most likely be to use a proper filter and other fields in FetchSpec so the computation is done as close to the data as possible.

If you logic is not expressible there, you can use the batchIds function available on Persistable (or its simpler twin: binIds).

  • That function allows you to split a population (identified by a first filter) into a certain number of batches.
  • The function returns a list of ids, corresponding to the splits between all the batches, see example below:

To get each batch you can make a fetch call:

var splits = SmartBulb.binIds(10);
for(var i=0; i <= splits.length; i++) { // 11 iterations
   var filter; 
   if(i == 0){
        filter =  Filter.lt("id", batches[0])
    } else if(i == splits.length ) {
        filter = Filter.ge("id", batches[i-1])
    } else {
        filter = Filter.ge("id", batches[i-1]).and.lt("id", batches[i])
    }
    var data = SmartBulb.fetch({
         filter:filter
    })
    //... do something with your data ...
}

1 Like

To answer your initial question, end-to-end streaming from C3 to Python is not currently supported, but we are working on it, stay tuned.

Thanks Louis.
I tried batchIds function on my type from the console, but it gives me a weird error, can you help me to figure out what’s wrong?
IGM is an external type and the data is on an Oracle instance (not sure if this can help).

This sounds like a great use case for a MapReduceJob where the ‘map’ and ‘reduce’ methods are implemented in python.

@alessandro.perina I asked internally and that function doesn’t seem to work with External Types :confused:
An alternative solution is to iterate over the objects with limit and offset:

function mapByBatch(srcType, filter, batchSize, callback) {
	var count = srcType.fetchCount({filter:filter})
	var offsets = _.range(0, count, batchSize)
	return _.flatten(_.map(offsets, function(offset) {
		var batch = srcType.fetch({
			filter:filter, 
			limit: batchSize, 
			offset: offset
		});
		if (batch.count > 0) {
			return _.map(batch.objs, callback);
		} else {
			return [];
		}
	}))
}

var results = mapByBatch(SmartBulb, "manufacturer.id == 'GE'", 10, 
	function(smblb) {
		// ... do something cool on each smart bulb here ...	
	}
)