Twitter基于R语言的时序数据突变检测（BreakoutDetection）

2022-11-30 13:52:01

Twitter开源的时序数据突变检测（BreakoutDetection）,基于无参的E-Divisive with Medians (EDM)算法，比传统的E-Divisive算法快3.5倍以上，并且具有鲁棒统计性，就是你加入一些离群点或异常点，并不影响该算法的检测效果，不过最关键的还是无参特性，有时候调参真是件摸着石头过河的事。

它认为突变有两种方式：

1.Mean Shift：突然跳变，比如CPU从40%一跃跳变为60%，像佛教里讲的“顿宗”

2.Ramp Up：缓慢从一个平稳状态渐变到另一个平稳状态，比如CPU从40%平稳缓慢渐变为60%并平稳，像佛教里讲的“渐宗”

项目源代码（提交后好像就不更新了）：https://github.com/twitter/BreakoutDetection/

其他针对该算法的讲解（可能需要*）：

https://blog.revolutionanalytics.com/2014/11/breakout-detection.html

https://blog.twitter.com/2014/breakout-detection-in-the-wild

https://anomaly.io/anomaly-detection-using-twitter-breakout/

【注】https://anomaly.io/真的挺不错，需要持续关注其中的博客

视频：

https://www.youtube.com/watch?v=fcsyL5TwIvE

该视频对应的PPT截图如下：

【注】Twitter默认提供的是基于R语言实现的算法，该视频作者提供了Python版的实现，地址为：

https://github.com/roland-hochmuth/BreakoutDetection

“鲁棒统计”

Robust Statistics：A minor error[the anomaly] in the mathematical model should cause only a small error in the final conclusions

[moving median（中位数）] 属于鲁棒统计，[moving average（平均值）]不属于鲁棒统计

https://anomaly.io/moving-median-robust-anomaly/

码农公寓

相关文章