Setup metrics for an entity that has distinct measurement series for each instance

We are working with ~60k unique Compressors that could have 1k-3k Pi tags. Because the Compressors are unique, the Pi tags are also unique, so we expect no overlap of tags across compressors. This means that each measurement series is unique to only one compressor. Our end goal is to build an SVM for each compressor (60k) using its respective tags to be used in a predictive maintenance application. When building the model, we don’t need to know what the tags are since we will apply the same treatments/operations to all metrics. But, we do want to be able to show feature importance by tag in the application after the predictions have been generated.

We have discussed a few challenges:

  1. Defining the metrics – If we define a unique metric for each MeasurementSeries, we would need to define 60k * (1k-3k) metrics. The other downside is that new metrics will need to be created for tags every time a new compressor is loaded.
  2. When we go to train the model, we need to select the metrics/features that apply to the specific instance of Compressor
  3. Depending on the solutions below, how to best trigger prediction updates using DFEs

The group discussed two solutions:

  1. Create a mapping from each Pi tag to a generic measurementSeries. We would essentially create dummy series Index1, Index2, …, IndexN where N = the max number of tags on any compressor. The main benefit of this is that you will not need to define new metrics when new compressors are added to the data because the metrics will already exist. We still will need to create N metrics, but this can be done pretty easily by using a loop to write a JSON file. After we have trained the model with the generic features, we will then need to join back to a mapping table in order to display the most important features using the real tag names.
  2. Define the metrics where the srcType is the CompressorMeasurementSeries (see code below) instead of the Compressor. If we do it this way, we only need to write one metric, but the concern is that this will cause trouble latter on in the machine learning pipeline when using DFEs. Our concern is that the DFEs can’t properly trigger prediction updates if the metrics are defined on CompressorMeasurementSeries instead of Compressor.

We are leaning towards using option 1 but want other opinions on the best way to approach the problem.

c3Viz(x = CompressorMeasurementSeries.evalMetricsWithMetadata(
    ids: ['CompressorMeasurementSeries1’, ‘CompressorMeasurementSeries2’],
    expressions: ["Measurement"],
    start: "2017-02-01T00:00",
    end: "2018-05-15T00:00",
    interval: "FIVE_MINUTE"
    "id": "Measurement",
    "name": "Measurement",
    "srcType": "CompressorMeasurementSeries",
    "expression": "avg(avg("

you may also use binding variables in your metric so that you create only one and provide the values on evaluation. Here is how to do it:

It is very unlikely that you will use thousands of features in your final model.
What you could do is use what @bachr suggested to retrieve all the features and build your model.
When you identify what features are actually being used by the model you can generate non-parametric metrics so that you can plug them into DFEs.

In both cases I think metrics should be defined on Compressor since this is the asset of interest here.

1 Like