Percentile outliers data cleaning


I have a timeseries of values defined in a metric, and I need to do some outliers cleaning by removing the bottom 5th percentile and top 95th percentile.
I would like to use the expression engine function, but I can’t figure out a way to do this.


        "id": "InternalTemperature",
        "name": "InternalTemperature",
        "expression": " -- filter out extreme values -- ",
        "description": "The value of the clean temperature."

Thank you,

  1. When you say “bottom 5th percentile and top 95th percentile”, this refers to the entire evaluation period, right? For example, if the interval is DAY from 2018-01-01 to 2019-01-01, you want to remove the bottom 5% and top 5% of 365 data points?

  2. If this is the case, are you fine with the thresholds (5%, 95%) being different just because the “start” and “end” are different? because start and end are different -> data points are different -> thresholds are different

  3. When you say “remove”, what do you mean? Mark them as 0? [-99, 1, 1, 1, 1, 1, 99] to be [0, 1, 1,1, 1, 1, 0]?


1 + 2. Yes, Yes (this serie represents the target value in my dataset, I will always take the largest period)
3. Ok, it’s good to transform them into 0s or any constant value.

This to me is not a typical timeseries manipulation, more like an array manipulation. I think a combination of tsDecl simple metrics and compound metrics can achieve this task. If you are working on External Types, then use actionDecl.

How does your expression look like?
Have you looked at the “percentile” function in the expression engine function library?

We also have a function called removeOutliers in the expression engine function library that applies a moving median window and a moving median absolute deviation to remove outliers. You might want to look into it as well.