Pearson Correlation, Metrics or Expression Engine Functions


#1

Hi There,

Asking about performance vs time trade off here.

ExpressionEngineFunction has AVG and STDDEV which could be used. However what is the trade off between creating all these metrics compared to just having a function such as:
“expression”: “window(‘CORR’, TimeSeriesA, TimeSeriesB)”.

If this was performed using metrics one would have to make a large magnitude as shown below every time they wanted to have correlation values between two data points.
Below diagram is an illustration of an idea of how to get a window correlation between two time series using metrics.

Could someone please outline the best approach with an explanation of the trade offs comparing metrics to functions?

I feel like usability would be better using the expression engine however do not know if this would create a cost or latency issue with high frequency data (e.g. 1sec). If this is not the case could someone point me in the direction of some materials that will allow me to create expression engine functions?

Thanks,

Aaron Butler

Origin Energy


#2

Hi Aaron

I agree with the statement that the usability will be much better for every developer who wants to get a correlation between two time series and would encourage that it is the right way to go forward. The beauty of putting it in the expression engine is that it is available in all the metrics / expressions naturally for others too in your organization.

You bring up a good point about performance when operating at really high frequency data but I would try to gauge the cost of it before giving up on it. Let’s outline how we will write the expression engine function and then worry about the cost.

There was a post from @rileysiebel that I’m pasting here for the example (from here: Metric referencing actionDecl-based metric returns empty timeseries):

Metric functions take Timeseries as their input and produce Timeseries as their output.

An example of a metric function which calculates resource spending based on consumption, demand, and the customer’s rate plan is as follows:

type RatePlanMetricLibrary mixes MetricFunctionLibrary {
  
  /**
   * Uses the Genability API to calculate the amount of money spent on electricity.
   *
   * @param ratePlans
   *           Information about how to calculate cost from consumption and demand.
   * @param consumption
   *           Electricity consumption data used in calculation. Units are kilowatt hours.
   * @param demand
   *           Electricity demand data used in calculation. Demand is the rate (first derivative with respect to
   *           time) of electricity consumption. Units are kilowatts.
   * @return the cost of electricity for each point in time. Aggregating this will yield the total cost.
   */
  calculateElectricitySpending: function(ratePlans: Timeseries,
                                         consumption: Timeseries,
                                         demand: Timeseries): Timeseries js server
}

The js file looks like this (just the top level function, internally called functions are not here):

function calculateElectricitySpending(ratePlans, consumption, demand) {
  var results;
  try {
    var specs = RatePlan.generateGenabilitySpecs(ratePlans, consumption, demand, 'electricity');
    results = _.flatten(_.map(specs.toArray(), function (spec) {
      return GenabilityService.calculateTariff(spec).toArray();
    }));
  } catch (e) {
    log.error(e);
    return unavailable(consumption);
  }

  return transformToTimeseries(results, consumption);
}

and a metric calling this function looks like this:

{
  "id" : "CalculatedBilledElectricitySpending",
  "name" : "CalculatedBilledElectricitySpending",
  "expression" : "calculateElectricitySpending(ElectricityRate, BilledElectricityConsumption, EstimatedBilledElectricityDemand)"
}

Where ElectricityRate, BilledElectricityConsumption etc. are themselves metrics.

There will definitely be a difference in performance in evaluating a second level metric vs an hourly metric due to the sheer number of points. Whether that is done in the platform or by the application developer, the impact will be similar. There will though be differences in the runtime performance depending on the language that you use.

I would encourage you to build your metric as a MetricFunctionLibrary metric and then run some performance tests to find out where you land and if they work for you.


#3

if there is any additional docs on how to test functions on the platform before a provision that would be great.


#4

The only supported way to get code “on the platform” is to provision. Think of it like compiling. How could you test your c code without compiling it?

For experimentation, you can copy all your code into the console and execute it as regular javascript (and similar for python in jupyter notebook)