How to modify the index of a Dataset

Hi All,

below there is the map function that I’m trying to build. I would need to have in my final dataset also a datetime column and I also need that the indexes are the inspectionId and not the servicePointId (my metrics are evaluated on the ServicePoint type). I know that right now is not possibile to have non numeric columns.

There is a way to modify the index of the Dataset using as index the concatenation of

inspectionId + ‘_’ + inspectionDate?

function map(batch, inspections, job) {
    var offsetExpression = job.offsetExpression;
    var expressionsAtInspectionDate = job.expressionsAtInspectionDate;
    var expressionsAtFeaturesDate = job.expressionsAtFeaturesDate;
    var inspectionDate, featuresDate, servicePointId, spec, offset, emr, servicePointDatasetAtInspectionDate, servicePointDatasetAtEvalMetricsDateTime, servicePointDataset;
    var servicePointDatasets = [];

    _.each(inspections, function (inspection) {
        inspectionDate = inspection.executionStartDateTime;
        servicePointId = inspection.servicePointId;

        spec = EvalMetricsSpec.make({
            ids: [servicePointId],
            expressions: expressionsAtInspectionDate,
            start: inspectionDate,
            end: inspectionDate,
            interval: 'DAY'
        });
        emr = ChInRevproTdServicePoint.evalMetrics(spec);
        servicePointDatasetAtInspectionDate = Dataset.fromEvalMetricsResult(emr);

        offset = emr.result.get(servicePointId).get(offsetExpression).data().first();
        featuresDate = inspectionDate.clone();
        featuresDate = featuresDate.addDays(-offset).toDateMidnight().moveToFirstDayOfMonth().plusDays(-1);

        spec = EvalMetricsSpec.make({
            ids: [servicePointId],
            expressions: expressionsAtFeaturesDate,
            start: featuresDate,
            end: featuresDate,
            interval: 'DAY'
        });
        emr = ChInRevproTdServicePoint.evalMetrics(spec);
        servicePointDatasetAtEvalMetricsDateTime = Dataset.fromEvalMetricsResult(emr);

        servicePointDataset = Dataset.concatenate([servicePointDatasetAtInspectionDate, servicePointDatasetAtEvalMetricsDateTime], 1)
        servicePointDatasets.push(servicePointDataset);
    });

    var batchDataset = Dataset.concatenateDatasets(servicePointDatasets);

    // Save dataset to S3
    var filename = 'batchDataset' + batch;
    var delimiter = ',';
    var fileContent = __createCsvString(batchDataset, delimiter);
    var file = S3File.make({
        contentType: 'text/csv; delimiter="' + delimiter + '"',
        contentEncoding: 'gzip',
        url: PathUtils.join(job.datasetFolder, filename + '.csv')
    });
    logger.info('Saving dataset to file ' + file.url);
    var encodedContent = S3File.encode(fileContent, file.contentType, file.contentEncoding);
    file.writeEncoded(encodedContent);

    return {
        'key': file.url
    };
}

Best,
Mario

If you have a dataset
var ds = Dataset.fromSpec(TensorSpec.make({flattenedData: [1, 2, 3, 4], shape: [2, 2], indices: {0: ['1', '2'], 1: ['A', 'B']}, axes: ["time", "features"]}));

and you want to update the indices along axis 0 (the rows) you can directly update like this:
ds.indices["0"] = ['3', '4'];

But I would not recommend it as there is not validation and you could just pass: ds.indices[0] = ['3']; and it would not complain but lead to downstream errors since it does not match the shape of the dataset.

I would recommend to re-create a dataset from spec using this:
ds = Dataset.fromSpec(TensorSpec.make({flattenedData: ds.flattenedData(), shape: ds.shape, indices: {0: ['3', '4'], 1: ds.indices[1]}, axes: ds.axes}));

Note that it will conserve all the parameters except the indices on the axis 0. If I try to pass only indices: {0: ['3'], 1: ds.indices[1]} it will error out as expected.

You can also use the function putField:

ds = ds.putField('indices', {0: ['3', '4'], 1: ds.indices[1]});

This will perform validation too.

Thanks Romain and Camile!

Then in my specific case I can do:

var index = expressionsAtFeaturesDate.toString() + inspectionId;

// servicePointDataset has just one row
servicePointDataset.putField(‘indices’, {0: [index], 1: ds.indices[1]});

correct?

Best,
Mario