SimpleMetric expression aggregation


#1

avg(avg(*)) is always explained as “aggregation across time (inner) and across space (outer)“. In this presentation at AWS re:Invent, Riley says “for a more detailed explanation, come see me after.” What’s the verbose explanation?


#2

Here are two slides to help explain in more detail with sum(avg(*)):


#3

Great start, thanks! Suppose I take this dummy evalMetrics call, which will work on an any environment that has had some data loaded:

var res = DataLoadProcessLog.evalMetricsWithMetadata(
{
    filter: "exists(id)",
    limit: 2,
    expressions: ["Dummy"],
    start: "2016-09-01T00:00",
    end: "2016-09-30T23:59",
    interval: "DAY"
}, 
    SimpleMetric.array([{
      "id": "Dummy",
      "name": "Dummy",
      "srcType": "DataLoadProcessLog",
      "expression": "sum(avg(chunkNumber))"
    }])
)

then I’ll get

var.result.size() === 2

which is to say, there were two timeseries returned. For each timeseries, and supposing the files were uploaded in one chunk each, the value at each step will be equal to one. Which is to say, it has taken the average across time (1), but not across space (it didn’t sum to 2).

In what situation would my evalMetrics call with an expression like sum(avg(series.normalized.data.quantity)) return three timeseries that get summed into one?

Followup: how is this concept different from rollupMetrics using rollupFunc: "SUM"?


#4

EvalMetrics will return one timeseries per source object. I assume that you have 2 DataLoadProcessLog objects where “exists(id)” is true.

If you want to rollup timeseries across source objects into one timeseries, see MetricEvaluatable.rollupMetric. In this case you are actually doing 3 aggregations: 1) time, 2) space, 3) sources

For clarification, when i say that the outer aggregation aggregates over SPACE, i mean that each source object might have multiple timeseries, and that those timeseries are aggregated for a source object.


#5

Maybe this example is relevant, using SmartBulbs and Temperature sensors.
If a SmartBulb only has one Temperature sensor, only one timeseries is available for aggregation. In this case, the aggregation over space has no effect.
If the SmartBulb had three Temperature sensors, resulting in three timeseries, then the aggregation over space would take effect on all of these with a single result.


#6

We’re trying to optimize actionDecl metrics that call rollupMetric over a number of sources.
Is there a way to use sum in an expression as above by somehow creating a single source representing all the different sources in the first sentence, so that it has multiple timeseries? Would it be faster than rollupMetric?
Is there a way to use Expression Engine’s rollup and collect the timeseries arguments like it is done for rollupMetric using its ids/filter argument?

Thanks