Remove timeseries data

#1

What is the best method to remove ALL measurements (or equivalent objects) and their time series headers?

So far I use snippets like:

PhysicalMeasurementSeries.fetch().objs.forEach(function(pms){ 
     Measurement.removeAll(Filter.eq("parent.id",pms.id)) 
}); 
PhysicalMeasurementSeries.removeAll()

Is it the best/most efficient method or is there something better ?

Bonus point: Is the method valid for any kv-store data? (imagine non timeseries data stored in cassandra or equivalent)

Thanks!

0 Likes

#3

If you do not want any further invalidations to be caused as a result of the data point removal, you can use

Measurement.clearCollection()

  • you have to be cluster admin to run that.
  • This will be a much faster operation than runnning Measurement.removeAll()

If you care about removing data with invalidations (if async processing is enabled), then use

Measurement.removeAll()
PhysicalMeasurementSeries.removeAll();

1 Like

#4

You could use a map-reduce job

function map(batch, objs) {
 objs.each(function(o) {Measurement.removeAll(Filter.eq("parent.id", o.id))})
}

JS.mapReduce({targetType:{typeName:"PhysicalMeasurementSeries"}, map:map})
0 Likes

#5

Following @rileysiebel’s comment, PhysicalMeasurementRemover and LegacyMeasurementRemover will remove the measurements as a MR job.

This is likely much cleaner and safer than trying to use an ad-hoc MR job (generally not recommended due to the potential of filling the MapReduceQueue with errors due to bugs in the ad-hoc job).

If you pass no jobFilter, it will use “1 == 1” when selecting the MeasurementSeries; you’ll need to pass a large enough number for dataRetentionCutoffDays to exceed the earliest data in the system.

1 Like