What determines if a record is a "duplicate" or not


#1

Regarding the duplicateHandling option in TimeseriesHeader:
Is it the date and time range matching exactly, with the same value for the property? Or will it check the other properties to determine if it is a duplicate.


#2

From the TimeseriesHeader.duplicateHandling documentation:

A duplicate point is one, which has the same start, end, value, unit, isEstimated


#3

What is value in this case?, For example here is a type for the data points:

@db(datastore='cassandra',
    partitionKeyField='parent',
    persistenceOrder='start',
    persistDuplicates=true,
    compactType=true,
    shortIdReservationRange=100000)
entity type MyType mixes TimeseriesDataPoint<MyTypeHeader> schema name "MYTYPE" {
  @ts(treatment='previous')
  myField1: double

  @ts(treatment='previous')
  myField2: string
}

If two instances of these have the same exact start and end date, unit, isEstimated, and myField1, but they have differing values for myField2, will it be considered a duplicate or not?


#4

De-duplication is one step of the normalization process, and if I remember correctly, the complete series of normalization steps is executed for each timeseries field (i.e. each field with a @ts annotation) on the data point type.

So value in this case is the value of the timeseries field being normalized.

Given your example, the normalization process will consider the second data point to be a duplicate when generating the normalized timeseries for myField1, but will not when generating the normalized timeseries for myField2.