US Futures (ES and NQ): Performance Benchmark 2023, 2024, and 2025
December 29, 2025
.
10 min read
1. Executive Summary
Quantum Signals provides pre-trained intraday AI signals for US equity index futures (ES, NQ). Our models predict short-horizon mid-price trends (up / stable / down) using finance-native neural networks trained on Level II limit order book (LOB) and microstructure features.
Objective. Review the performance of pre-trained AI signals on ES and NQ futures and understand how the technology performs and generalizes.
Bigger picture. Pre-trained signals are a baseline that can be used for validation. We do not expect all our customers to trade using these signals. The goal is to help you design custom signals tailored to your symbols, horizons, entry cadence, and risk rules.
2. Review the Basics
2.1 Data
We use Level II LOB (10 levels, prices & volumes) from CME Globex with ~3 years’ worth of history for training, plus the real-time LOB feed for near-real-time predictions.
2.2 Predictions
For the two pre-trained signals we are reviewing in this document we are predicting the following:
- Target variable: mid-price trend (Up / Down / Stable) between two averaging windows:
- Start window: [now, now + 5 min] (average over next 5-minutes)
- End window: end of the day (average over 15:55–16:00 ET)
- Stable: ±2 bps band around 0
- Prediction cadence: every 1 minute on CME Trading Days for Equities, 09:30–16:00 ET
2.3 AI Model
Current model used: Pythia-v0.1.1-Sep25
We introduced our baseline model Pythia-v0.1.0 in a recent blog post. The “Sep25” suffix corresponds to the last month in the data set used to train, test, and validate the model.
We utilize a Temporal Fusion Transformer (TFT) architecture optimized for market microstructure. This is not an LLM, we use transformer components tailored for finance. Here are some of the key considerations the model takes into account:
- multi-scale time features (LOB events, seconds, minutes, days)
- probability calibration (post-processing of the output using class probabilities)
- cost-sensitive loss function aligned to PnL
Technical paper on TFT: “Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting”, B. Lim et al (available at https://arxiv.org/abs/1912.09363).
3. Benchmarks
3.1 Training, Validation, and Test periods
We are retraining our models every 3 months, and evaluating performance out-of-sample.
- Training: Several months of historical data that we use to train the model. Every 3 months we append another 3 months of data to the end of this data set.
- Validation: A 3-month period (rolling forward every 3 months), not seen during training, that we use to pick the best performing model.
- Test: A 3-month period (rolling forward every 3 months), not seen during training or validation, that we use to report performance.
3.2 Trading Strategy Used for Benchmarking
Quick review of the trading strategy used:
- Every 5 minutes (between 09:45 and 15:30 ET) we decide to take a long, short or no position using 1/70 of our starting portfolio for the day (there are 70 possible openings per day).
- Each long/short position is then split into 5 parts and executed on each minute for the next 5 minutes following the decision. There is no sizing adjustment.
- We exit all positions at the end of the day again split over five minutes (15:55–16:00 ET).
- Costs: 1 bp round-turn assumption (ES: 1bp ~ 2 ticks = $25.0, NQ: 1bp ~ 8 ticks = $40.0).
- Extra exchange/clearing fees not included.
- Contract series & roll: Front-month continuous. Switch at the open T–5 trading days before expiration—stop trading the expiring contract and start trading the next
3.3 Performance Metrics
E-mini S&P 500 (ES):



E-mini Nasdaq-100 (NQ):



Notes:
- Annualization: daily conversion to annual using √252.
- Sharpe uses daily returns.
- Ann. Vol: Annualized Volatility.
- MDD: Max Drawdown.
- Rebasing: curves rebased to 100 at period start.
4. Historical Prediction Data
The predictions generated by the model are available in the following files for download to anyone interested in using them in their own test harness. There is one file per model.
- ES Backtest Predictions Q1 2023 - Q4 2025:
https://drive.google.com/file/d/1hE_2FKAmjARrwEmontQnInnxjTi_yjxY/view?usp=sharing
- NQ Backtest Predictions Q1 2023 - Q4 2025:
https://drive.google.com/file/d/1XEAi4eZsozz5XICRc-EqQTaSwqSF5yhK/view?usp=sharing
Here are the column names and meaning in those files:
- date_time: Timestamp (in Eastern Standard Time) of each observation (1-minute cadence) | (datetime string)
- predictions: Model-predicted class (0 for Down, 1 for Stable, 2 for Up) | (int)
- actual_labels: Realized (actual) class label for the outcome (0/1/2) | (int)
5. Next steps
- Download past predictions from the links in Section 4 and test with your own test harness and strategies.
- Review out-of-sample performance by consuming the predictions in real-time using our API. This requires signing-up for our “Professional” tier which includes a 1-month free trial.
- Go beyond pre-trained signals by tailoring a signal to your specific needs. Customize by symbol, target variable, time horizon, neutral band, entry cadence and risk rules.
6. Contact
Yianni Gamvros
Co-founder & CEO
yianni [at] quantumsignals.ai
+ 1 202 390 4935
Disclaimers:
- Futures trading involves substantial risk of loss and is not suitable for all investors.
- Hypothetical/simulated results do not represent actual trading, may under- or over-state market impacts (e.g., liquidity), and reflect hindsight.
- No representation that any account will achieve similar results.
- Past performance is not necessarily indicative of future results.
- Quantum Signals is not an NFA member


