Best practice for ingesting highly denormalized data


#1

We have a customer who is sending a single denormalized data file, where each row populates Measurement, MeasurementSeries and ServicePoint types. For a file with 1 million rows, we may only have 100 unique Series and 10 unique ServicePoints. We have transforms in place but we end up upserting same 10 ServicePoint rows a million times. That seems wasteful and also creates “version conflict” errors.

What is the best practice to deal with this kind of scenario? Anything we can do with JS transform? This is a legacy customer, we cannot ask them to send separate normalized files for device/series.


#2

@DavidT @trothwein your input would be appreciated… We actually have this situation in several places across our customer base, and we’ve been brute-forcing through it. Not ideal at small scale, not workable at large scale. :slight_smile:


#3

@yaroslav I think we would need to have a discussion about this. I don’t think we have anything currently that can handle this differently than it is being handled.