Handling duplicate and overlapping time series items

#1

I understand that duplicate time series items have the same start/end and a few other fields. And I understand that overlapping time series items have start/end and other fields where the time range overlaps.

I want to understand a few things about these –

I found a phrase in another posting suggesting there are several ways of handling overlap. I suppose there might be different strategies, such as averaging overlapping values, discarding some overlapping values, or throwing errors. Is this handled with annotations?

When we say “handle overlap” does that apply to duplicates as well? Namely, is duplicate items in a time series just a special case of overlapping items?

For overlapping items with non-duplicate time stamps, there is a portion of the time ranges which do not overlap. How is that handled?

0 Likes

#2

No, this is specified on a per-timeseries-header basis, via the overlapHandling field. See documentation for TimedDataFields.overlapHandling, an enum field with possible values: SUM, AVG, MIN, and MAX.

NB: Both the TimedDataHeader and TimeseriesHeader types mix TimedDataFields


De-duplication is the step in the normalization process before de-overlappping. Similar to de-overlapping, the de-duplication strategy is specified on a per-timeseries-header basis, via the duplicateHandling field. See documentation for TimedDataFields.duplicateHandling, a field with possible values: IGNORE and KEEP.

If set to IGNORE, the data points will be reduced to a single data point by simply discarding the duplicates.
If set to KEEP, the duplicate data points are kept and effectively become overlapping data points to be handled in the next normalization step.

So yes, “handle overlap” does apply to duplicates as well, provided duplicateHandling is set to KEEP.


By disaggregating data points as necessary to eliminate overlaps.

The output of the de-overlapping step is a list of data points where no two data points have the same start nor the same end, and no two data points overlap from their start to end.

Here is an illustrated example (single horizontal axis, depicting time):

Raw data input: (2 data points)
|----|
|---------|

Intermediate, disaggregated values: (3 data points)
|----|
|----|
     |----|

De-overlapped output: (2 data points)
|====|
     |----|

In this case, the second data point from the input is disaggregated into two parts: one part that overlaps with the first data point in the input, and another part that does not overlap, which is used as a “new” data point in the output.

0 Likes