Treatment with deduplication


#1

I have very high frequency data where data points are measured in milliseconds. (nanosecond, actually - but let’s look at the problem at millisecond level)

I understand we can store up to milliseconds when using datetime with millis … and I have done this,… but it looks like metric evaluation will still deduplicate on second level

Consider this series of data points:

2019-02-23T00:00:00.001,100
2019-02-23T00:00:00.002,100
2019-02-23T00:00:00.003,100

evalMetric is treating the above series as (millis ignored):

2019-02-23T00:00:00,100
2019-02-23T00:00:00,100
2019-02-23T00:00:00,100

and then performs deduplication – which results in:

2019-02-23T00:00:00,100

Is there anyway to change this behavior? Even better, can I disable deduplication and have the values summed?

The evalmetrics result that I need for my above example is
2018-09-24T00:00:00,300

I am using tsDecl to define my simple metric for this data.

Thanks

Paul


#2

De-duplication is a step in the normalization pipeline, so you may need to implement a custom Normalizer type to prevent this


#3

@paulyip are you sure you reloaded the data after changing the data type. TsDecl is supposed to use the date as returned by what is stored and hence dates that have a millisecond difference should not be considered as a duplicate.


#4

See if you can use KEEP option mentioned in https://VanityURL/api/1/TENANT/TAG/documentation/topic/normalization .

“Duplicate handling can be overriden by setting the duplicateHandling field on your series header.”


#5

@rohit.sureka - yes, I’m sure. I had performed truncate before my last reload.


#6

Thanks Alex, that is a good lead.

However, I’m using TsDecl, not TimedDataPoint or the other methods mentioned in that doc. After exploring TsDecl, it appears the same options for duplicate handling (e.g. KEEP) are not available?

I can change the implementation, if that is the only way… but I’d rather not re-load (its a lot of data)


#7

@paulyip What is the treatment and overlapHandling for your tsDecl metric?
Try setting treatment to sum first, if it still doesn’t work, try setting overlapHandling to sum.

If setting overlapHandling to sum works, it means we are ignoring milliseconds when checking for overlaps, you should create a ticket.


#8

@jinyanliu @AlexBakic All these are workarounds, we should make sure that the fundamental issue of milli seconds not being retained at the TSDecl layer is the primary issue, which if it is, should be fixed


#9

Hi @jinyanliu

Can you clarify how to specify overlapHandling for tsDecl? I don’t see an option in c3ShowType(TsDecl)

Thanks!

Paul


#10

@paulyip It’s c3ShowType(TSDecl). You should see a field for overlapHandling.

Example:

    {
      "id": "SomeName_SomeType",
      "name": "SomeName",
      "srcType": "SomeType",
      "path": "path.to.data",
      "tsDecl": {
        "data": "data",
        "start": "timestamp",
        "value": "value",
        "overlapHandling": "SUM",
        "treatment": "SUM",
      }
    }