Inputs for speeding up slow rollup metric


#1

I’ve a compound metric with deep dependency graph, it depends on 35 simple metrics most of them are simply graphing measurements data.

I don;t have a lot of Facilities (83), but when I run a rollupMetric on this metric, the result is retuned after 1h!!!

var spec = {
    ids: ["s00603-b001-log001"],
    expressions: ["IECS_Energy_Apportionned_Consumption"],
    start:"2018-01-01",
    end:"2019-01-01",
    interval:"DAY"
}
Facility.rollupMetric({
  start: spec.start,
  end: spec.end,
  interval: spec.interval,
  filter: Filter.eq('site.id', 's00603').and.eq('facilityType', 'LOGEMENT'),
  expressions: spec.expressions,
  rollupFunc: 'sum',
  limit: -1,
  cache: true
}

I tried to add cache to all the underlying SimpleMetric but it;s not helping:

    "cache": {
      "intervals": ["DAY", "MONTH"],
      "monthsInPast": 12
    }

When looking at splunk data, I see that there is a lot of fetch happening and it;s the primray cause of the problem;

 Callee	Status	Calls	%	Σ T. time	Σ S. CPU	Σ S. I/O	μ T. time	μ S. time	μ T. CPU	μ S. CPU	μ T. I/O	μ S. I/O	μ T. SQL	μ S. SQL	μ T. K/V	μ S. K/V
 Facility.getSumIndividualMetrics	Complete	23	100.00	1h13m23s	7s		3m11s	0s	2m20s	0s						
 - Facility.rollupMetric	Complete	23	100.00	1h13m16s	6m59s		3m11s	18s	2m20s	18s						
  - Facility.fetchObjStream	Complete	8,073	100.00	46m27s	1s		0s	0s	0s	0s						
   - Facility.fetch	Complete	8,073	100.00	46m25s	6m58s		0s	0s	0s	0s						
    - Facility.fetchObjStream	Complete	14,456	100.00	38m42s	2s		0s	0s	0s	0s						
     - Facility.fetch	Complete	14,456	100.00	38m40s	5m58s		0s	0s	0s	0s						
      - Facility.fetchObjStream	Complete	2,848	100.00	31m00s	1s		1s	0s	0s	0s						
       - Facility.fetch	Complete	2,848	100.00	31m59s	5m15s		1s	0s	0s	0s						
        - ServicePoint.fetchObjStream	Complete	1,422	91.57	25m46s	0s		1s	0s	1s	0s						
         - ServicePoint.fetch	Complete	1,422	91.55	25m46s	4m54s		1s	0s	1s	0s						
          - ServicePointMeterAsset.fetchObjStream	Complete	1,422	72.73	20m28s	1s		1s	0s	0s	0s						
           - ServicePointMeterAsset.fetch	Complete	1,422	72.68	20m27s	5m10s		1s	0s	0s	0s						
            - MeterAsset.fetchObjStream	Complete	1,422	33.64	9m28s	1s		0s	0s	0s	0s						
             - MeterAsset.fetch	Complete	1,422	33.58	9m27s	3m23s		0s	0s	0s	0s						
              - MeterAsset.fetchObjStream	Complete	1,422	10.48	2m57s	0s		0s	0s	0s	0s						
               - MeterAsset.fetch	Complete	1,422	10.47	2m57s	4s		0s	0s	0s	0s						
              - PointPhysicalMeasurementSeries.fetchObjStream	Complete	1,422	1.35	23s	1s		0s	0s	0s	0s						
               - PointPhysicalMeasurementSeries.fetch	Complete	1,422	1.30	22s	4s		0s	0s	0s	0s						

Any clues on what to do next?


#2

@bachr do explain plan on evalMetrics & figure out which simple metric is taking the maximum time. If these metrics are only to be used in the rollup scenarios then you can imagine caching these metrics as well by setting the cache field on the simple metric


#3

Most of the metric are evaluate in a reasonable time, there is one with huge timing:

 key: Facility_s00603-b001-log001_Sum_IECS_Energy_Apportionned_Consumption_Excluded_2018-01-01T00:00:00.000_2019-01-01T00:00:00.000_DAY
 time: 1418.841631
 count: 1

#4

We have noticed that setting the cache field on the simple metric slows down unit tests in which the metric is used, to the extent that they timeout very easily. Is there another way to request caching, e.g., in a call to evalMetric[s]? Are the two equivalent?


#5

@AlexBakic that makes no sense. If you didn’t cache the metric it would take longer to evaluate each time right? I’m guessing there are some other characteristics that you are running into that is taking the time during tests.

@bachr great job on finding the offending metric. You can choose to cache it or find out why it’s taking that much time and try to optimize it


#6

It seems that the objects we create in the tests invalidate some metrics (because they have cache set now) and they get evaluated while waitForSetup is waiting for the queues to drain. But now we have approx. 150 of them… I am trying to pass to waitForSetup an array of queues to wait on, not sure if it’s gonna work. BTW, it seems that the cache field of EvalMetricSpec is complementary, for reading from cache.

In general, we would like to be able to turn optimizations like this one on and off at run time (or at least at provision/design/compile time).