ML Order-Flow Predictor
Gradient boosting model trained on order flow imbalance, quote updates, and trade intensity to predict 5-minute forward returns in high-volume equities.
Strong in-sample, IR drops 38% out-of-sample. Approved for paper trading at $10,000 notional.
How It Works
Supervised learning model ingests Level 2 order book data and predicts short-term price movements using ensemble methods.
Mechanics
- 1.Collect real-time Level 2 data (bid/ask depth, order sizes, cancellations)
- 2.Engineer features: order flow imbalance, quote velocity, aggressive buy/sell ratios
- 3.Train XGBoost model on rolling 3-month window (retrain weekly)
- 4.Generate predictions every 30 seconds for top 50 liquid equities
- 5.Execute trades when prediction confidence exceeds 75% threshold
Signals
Performance Results
Implementation
Top 50 S&P 500 constituents (AAPL, MSFT, TSLA, etc.)
Direct market access (DMA), aggressive IOC orders for speed
Continuous (predictions every 30s, trades on threshold breach)
Max 1% per position, -10% daily stop loss, max 20 simultaneous positions
C++ low-latency engine, GPU model inference, FIX protocol for execution
Risk Analysis
Overfitting
High ImpactWalk-forward testing, k-fold cross-validation, regularization
Regime Shift
High ImpactMonthly model retraining, performance monitoring triggers
Latency
Medium ImpactCo-located servers, optimized execution logic
Data Quality
Medium ImpactRedundant data feeds, anomaly detection filters
Backtest Results
Stress Period Analysis
Model struggled with regime change
Reduced signal strength in calm markets
Performance improved post-retraining
Deployment Status
Paper trading since Apr 15, 2026
$5,000 notional (paper only)
Paper trading account (Alpaca)
Real-time slippage analysis, model drift detection, daily review