The Problem
Quantitative trading is one of those domains where the gap between "I read about it" and "I built it" is enormous. There is no shortage of blog posts explaining moving average crossovers. There is a severe shortage of actual systems that ingest live market data, engineer hundreds of features, train models, validate out-of-sample, size positions with risk management, and execute trades against a real broker — even if it is paper trading.
We wanted to build the full stack: data ingestion, feature engineering, model training, backtesting, walk-forward validation, portfolio management, risk controls, and live execution. Not to get rich (this is paper trading, and we are honest about that), but because the engineering challenge is genuinely hard and the domain demands rigorous software practices.
The Approach
The system runs three strategies — RSI(2), ConsecutiveDown, and MomentumBreakout — across 8 ETFs: SPY, QQQ, IWM, DIA, XLF, XLK, XLE, XLV. Every trading day at 5 PM ET, a GitHub Actions cron job triggers the signal scanner. On Mondays, it runs a weekly calibration pass to update model parameters.
Feature engineering is where the real work lives. 878 features extracted from OHLCV (Open, High, Low, Close, Volume) data — technical indicators, statistical moments, regime detection signals, volatility measures, and cross-asset correlations. All features go through validate_ohlcv() and clean_ohlcv() before touching a model. Garbage in, garbage out is not just a cliché in financial ML — it is the primary failure mode.
The RSI(2) strategy needs 200 bars of SMA data before it can generate a signal. That means your test fixtures need at least 200+ bars of realistic data, and your backtesting engine needs to handle the warm-up period gracefully. Sounds obvious, but we wasted a full day debugging why our backtest results looked wrong before realizing we were testing with 50-bar samples.
Technical Decisions
Ensemble models over a single algorithm. No single model is reliably best across all market conditions. The system trains multiple models and combines their signals, weighted by recent out-of-sample performance. Walk-forward validation (using WalkForwardEngine) ensures that every metric we report comes from data the model has never seen during training. This is critical — in-sample backtesting results are essentially fiction.
Position sizing with Kelly Criterion and ATR. The system supports three sizing modes: fixed percentage, ATR-based (scaling position size inversely with volatility), and fixed fractional (Kelly). Portfolio heat limits prevent overexposure — if total risk across open positions exceeds the threshold, new signals are suppressed. Correlation scaling reduces position sizes when multiple correlated ETFs are signaling simultaneously. These are not theoretical risk controls; they fire regularly.
Alpaca for paper trading, not a custom simulation. We could have built a paper trading simulator, but that is just simulating a simulator. Alpaca's paper trading uses real market data with simulated execution, which surfaces issues that pure backtesting cannot — order fills at different prices than expected, API latency, connection drops. The $25K paper account runs live daily.
Streamlit dashboard for monitoring. The app has 10 pages covering benchmark comparison, Monte Carlo simulation, correlation analysis, strategy optimization, and portfolio overview. Streamlit is not production-grade for customer-facing dashboards, but for an internal monitoring tool, it is ideal — fast to build, easy to iterate, and every chart is a few lines of Python.
Grid search optimization. The optimizer runs exhaustive parameter sweeps across strategy configurations, evaluating Sharpe ratio, max drawdown, and win rate for each combination. It is slow but thorough — and correctness matters more than speed for a system managing (simulated) money.
What We Learned
Two things stand out. First, regime detection is the hardest unsolved problem in this space. Markets behave fundamentally differently in trending, mean-reverting, and volatile conditions. Our regime detection module helps, but it is closer to "educated guessing" than "reliable classification." Being honest about this limitation is important — anyone claiming their algorithm "works in all market conditions" is selling something.
Second, the test suite (637 tests) is not optional for financial software. A bug in position sizing or signal generation could produce wildly incorrect P&L numbers, and you would never know unless something obviously breaks. The conftest fixtures generate realistic market data with specific properties (trending, mean-reverting, volatile), and every strategy, every sizing mode, and every risk control has dedicated test coverage. When we change the feature engineering pipeline, we know immediately if something downstream breaks.
This is paper trading. We have not made money with it. But the engineering — data pipelines, ML training infrastructure, real-time execution, risk management — is the same whether the account is paper or live. The code does not know the difference.
Need ML-powered financial tools? Let's talk about your data.