QuantileFlow: A Unified and Accelerated Quantile Sketching Framework for Anomaly Detection in Streaming Log Data

Authors

  • Dhyey Mavani Department of Computer Science, Amherst College, Amherst, MA, USA Author
  • Tairan (Ryan) Ji Department of Computer Science, Amherst College, Amherst, MA, USA Author
  • Marius Cotorobai Department of Computer Science, Amherst College, Amherst, MA, USA Author

DOI:

https://doi.org/10.32628/CSEIT261212

Keywords:

Anomaly Detection, DDSketch, HDR Histogram, Log Analytics, MomentSketch, Quantile Sketch

Abstract

Quantile sketching enables scalable estimation of tail latencies such as the 95th and 99th percentiles without storing full streams, making it a practical foundation for anomaly detection in observability pipelines. We introduce QuantileFlow, a unified framework that standardizes ingestion, query, merge, and serialization across multiple quantile sketch families. Using the LogHub HDFS v1 dataset in a production style streaming pipeline, we process 575,059 latency events end to end and benchmark accuracy, memory footprint, throughput, and runtime under identical workloads. We also microbenchmark insertion by adding 1,000,000 log normal samples and attribute execution time to key internal routines. Across experiments, DDSketch provides the strongest throughput while preserving tail fidelity through relative error guarantees. HDR Histogram maintains stable precision across wide dynamic ranges but is more sensitive to configuration and incurs higher overhead. MomentSketch is most compact in memory and efficient for smooth unimodal data, but its quantile accuracy degrades on heavy tailed or multimodal streams. Finally, we show an optimized QuantileFlow DDSketch implementation that improves throughput by 44% over DataDog’s official implementation and 2.67 times a prior internal version.

Downloads

Download data is not yet available.

References

X. Ma et al., “Practitioners’ Expectations on Log Anomaly Detection,” Dec. 02, 2024, arXiv: arXiv:2412.01066. doi: 10.48550/arXiv.2412.01066.

C. Masson, J. E. Rim, and H. K. Lee, “DDSketch: A fast and fully-mergeable quantile sketch with relative-error guarantees,” Proc. VLDB Endow., vol. 12, no. 12, pp. 2195–2205, Aug. 2019, doi: 10.14778/3352063.3352135. DOI: https://doi.org/10.14778/3352063.3352135

G. Tene, Hdrhistogram: A high dynamic range (hdr) histogram. [Online]. Available: https://hdrhistogram.github.io/HdrHistogram/

E. Gan, J. Ding, K. S. Tai, V. Sharan, and P. Bailis, “Moment-Based Quantile Sketches for Efficient High Cardinality Aggregation Queries,” Jul. 13, 2018, arXiv: arXiv:1803.01969. doi: 10.48550/arXiv.1803.01969. DOI: https://doi.org/10.14778/3236187.3236212

W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan, “Detecting large-scale system problems by mining console logs,” in Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, in SOSP ’09. New York, NY, USA: Association for Computing Machinery, Oct. 2009, pp. 117–132. doi: 10.1145/1629575.1629587. DOI: https://doi.org/10.1145/1629575.1629587

J. Zhu, S. He, P. He, J. Liu, and M. R. Lyu, “Loghub: A Large Collection of System Log Datasets for AI-driven Log Analytics,” Sep. 13, 2023, arXiv: arXiv:2008.06448. doi: 10.48550/arXiv.2008.06448. DOI: https://doi.org/10.1109/ISSRE59848.2023.00071

J. Kreps, “Kafka : a Distributed Messaging System for Log Processing,” 2011. Accessed: Jan. 12, 2026. [Online]. Available: https://www.semanticscholar.org/paper/Kafka-%3A-a-Distributed-Messaging-System-for-Log-Kreps/ea97f112c165e4da1062c30812a41afca4dab628

D. Arthur, dpkp/kafka-python. (Jan. 13, 2026). Python. [Online]. Available: https://github.com/dpkp/kafka-python

P. Hunt, M. Konar, Y. Grid, F. Junqueira, B. Reed, and Y. Research, “ZooKeeper: Wait-free Coordination for Internet-scale Systems,” ATC USENIX, vol. 8, Jun. 2010.

DataDog/sketches-py. (Oct. 13, 2025). Python. Datadog, Inc. Accessed: Jan. 12, 2026. [Online]. Available: https://github.com/DataDog/sketches-py

Downloads

Published

28-01-2026

Issue

Section

Research Articles

How to Cite

[1]
Dhyey Mavani, Tairan (Ryan) Ji, and Marius Cotorobai, “QuantileFlow: A Unified and Accelerated Quantile Sketching Framework for Anomaly Detection in Streaming Log Data”, Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol, vol. 12, no. 1, pp. 250–259, Jan. 2026, doi: 10.32628/CSEIT261212.