scienceexperiments12 min read

Building a Real-Time Anomaly Detection System with Streaming ML

We built and tested a streaming anomaly detection pipeline processing 50,000 events per second. Here is the architecture and what we learned.

person

Jennifer Park

Principal Data Architect

December 22, 2025

12 min read

Anomaly DetectionStreamingReal-Time MLKafkaData Engineering
Building a Real-Time Anomaly Detection System with Streaming ML

Batch-based anomaly detection introduces unacceptable delays for use cases like fraud prevention, infrastructure monitoring, and cybersecurity. We built a real-time streaming ML pipeline to detect anomalies within seconds of occurrence and tested it under production-scale loads.

Architecture Overview

The pipeline consists of four layers: ingestion (Apache Kafka), feature engineering (Apache Flink), model serving (a custom ensemble deployed on KServe), and alerting (PagerDuty integration). Each layer is horizontally scalable and independently deployable.

Data Ingestion

Kafka topics receive events from application logs, transaction systems, and infrastructure metrics. We configured 32 partitions per topic to parallelize processing. Schema Registry enforces event contracts and enables schema evolution without pipeline breaks.

Streaming Feature Engineering

Flink jobs compute rolling statistical features over multiple time windows: 1-minute, 5-minute, and 1-hour aggregations. Features include transaction velocity, deviation from historical baselines, geographic anomaly scores, and cross-entity correlation metrics. The feature store maintains low-latency read access with a p99 latency of 3ms.

Model Architecture

We deployed an ensemble combining an Isolation Forest for statistical anomalies, a gradient-boosted model for pattern-based detection, and a small transformer model for sequential pattern analysis. The ensemble votes with configurable thresholds per anomaly type to balance precision and recall.

Load Testing Results

At 50,000 events per second, end-to-end latency from event ingestion to alert delivery averaged 1.2 seconds (p50) and 3.4 seconds (p99). False positive rate held at 0.3% with a true positive rate of 94.7%. The system consumed 48 CPU cores and 96GB RAM across the Flink cluster.

Lessons Learned

Feature engineering is the bottleneck, not model inference. Invest heavily in efficient streaming aggregations. Deploy models behind a feature flag so you can roll back without pipeline changes. Monitor feature drift in real-time alongside model predictions. And always have a manual override mechanism for high-impact alerts.

What We Would Do Differently

Start with simpler statistical methods (Z-score, IQR) as a baseline before introducing ML models. The added complexity of ML is only justified when statistical methods plateau. In our case, the ML ensemble improved detection by 12% over statistical baselines, which was worth the complexity for this use case.

About the Author

person

Jennifer Park

Principal Data Architect

Jennifer designs high-throughput data systems and streaming architectures for real-time analytics and ML applications.

Related Articles

Join Our Newsletter
Subscribe to get weekly AI insights, case studies, and expert tips delivered to your inbox.

Ready to Transform Your Business with AI?

Get expert guidance on implementing the strategies discussed in this article.