QuantileFlow: A Unified and Accelerated Quantile Sketching Framework for Anomaly Detection in Streaming Log Data
DOI:
https://doi.org/10.32628/CSEIT261212Keywords:
Anomaly Detection, DDSketch, HDR Histogram, Log Analytics, MomentSketch, Quantile SketchAbstract
Quantile sketching enables scalable estimation of tail latencies such as the 95th and 99th percentiles without storing full streams, making it a practical foundation for anomaly detection in observability pipelines. We introduce QuantileFlow, a unified framework that standardizes ingestion, query, merge, and serialization across multiple quantile sketch families. Using the LogHub HDFS v1 dataset in a production style streaming pipeline, we process 575,059 latency events end to end and benchmark accuracy, memory footprint, throughput, and runtime under identical workloads. We also microbenchmark insertion by adding 1,000,000 log normal samples and attribute execution time to key internal routines. Across experiments, DDSketch provides the strongest throughput while preserving tail fidelity through relative error guarantees. HDR Histogram maintains stable precision across wide dynamic ranges but is more sensitive to configuration and incurs higher overhead. MomentSketch is most compact in memory and efficient for smooth unimodal data, but its quantile accuracy degrades on heavy tailed or multimodal streams. Finally, we show an optimized QuantileFlow DDSketch implementation that improves throughput by 44% over DataDog’s official implementation and 2.67 times a prior internal version.
Downloads
References
X. Ma et al., “Practitioners’ Expectations on Log Anomaly Detection,” Dec. 02, 2024, arXiv: arXiv:2412.01066. doi: 10.48550/arXiv.2412.01066.
C. Masson, J. E. Rim, and H. K. Lee, “DDSketch: A fast and fully-mergeable quantile sketch with relative-error guarantees,” Proc. VLDB Endow., vol. 12, no. 12, pp. 2195–2205, Aug. 2019, doi: 10.14778/3352063.3352135. DOI: https://doi.org/10.14778/3352063.3352135
G. Tene, Hdrhistogram: A high dynamic range (hdr) histogram. [Online]. Available: https://hdrhistogram.github.io/HdrHistogram/
E. Gan, J. Ding, K. S. Tai, V. Sharan, and P. Bailis, “Moment-Based Quantile Sketches for Efficient High Cardinality Aggregation Queries,” Jul. 13, 2018, arXiv: arXiv:1803.01969. doi: 10.48550/arXiv.1803.01969. DOI: https://doi.org/10.14778/3236187.3236212
W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan, “Detecting large-scale system problems by mining console logs,” in Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, in SOSP ’09. New York, NY, USA: Association for Computing Machinery, Oct. 2009, pp. 117–132. doi: 10.1145/1629575.1629587. DOI: https://doi.org/10.1145/1629575.1629587
J. Zhu, S. He, P. He, J. Liu, and M. R. Lyu, “Loghub: A Large Collection of System Log Datasets for AI-driven Log Analytics,” Sep. 13, 2023, arXiv: arXiv:2008.06448. doi: 10.48550/arXiv.2008.06448. DOI: https://doi.org/10.1109/ISSRE59848.2023.00071
J. Kreps, “Kafka : a Distributed Messaging System for Log Processing,” 2011. Accessed: Jan. 12, 2026. [Online]. Available: https://www.semanticscholar.org/paper/Kafka-%3A-a-Distributed-Messaging-System-for-Log-Kreps/ea97f112c165e4da1062c30812a41afca4dab628
D. Arthur, dpkp/kafka-python. (Jan. 13, 2026). Python. [Online]. Available: https://github.com/dpkp/kafka-python
P. Hunt, M. Konar, Y. Grid, F. Junqueira, B. Reed, and Y. Research, “ZooKeeper: Wait-free Coordination for Internet-scale Systems,” ATC USENIX, vol. 8, Jun. 2010.
DataDog/sketches-py. (Oct. 13, 2025). Python. Datadog, Inc. Accessed: Jan. 12, 2026. [Online]. Available: https://github.com/DataDog/sketches-py
Downloads
Published
Issue
Section
License
Copyright (c) 2026 International Journal of Scientific Research in Computer Science, Engineering and Information Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.