US Futures (ES and NQ): Performance Benchmark 2023, 2024, and 2025

December 29, 2025

.

10 min read

1. Executive Summary

Quantum Signals provides pre-trained intraday AI signals for US equity index futures (ES, NQ). Our models predict short-horizon mid-price trends (up / stable / down) using finance-native neural networks trained on Level II limit order book (LOB) and microstructure features. 

Objective. Review the performance of pre-trained AI signals on ES and NQ futures and understand how the technology performs and generalizes.

Bigger picture. Pre-trained signals are a baseline that can be used for validation. We do not expect all our customers to trade using these signals. The goal is to help you design custom signals tailored to your symbols, horizons, entry cadence, and risk rules. 

2. Review the Basics 

2.1 Data

We use Level II LOB (10 levels, prices & volumes) from CME Globex with ~3 years’ worth of history for training, plus the real-time LOB feed for near-real-time predictions.

2.2 Predictions 

For the two pre-trained signals we are reviewing in this document we are predicting the following:

  • Target variable: mid-price trend (Up / Down / Stable) between two averaging windows:
    • Start window: [now, now + 5 min] (average over next 5-minutes) 
    • End window: end of the day (average over 15:55–16:00 ET) 
  • Stable: ±2 bps band around 0 
  • Prediction cadence: every 1 minute on CME Trading Days for Equities,  09:30–16:00 ET 

2.3 AI Model

Current model used: Pythia-v0.1.1-Sep25

We introduced our baseline model Pythia-v0.1.0 in a recent blog post. The “Sep25” suffix corresponds to the last month in the data set used to train, test, and validate the model.

We utilize a Temporal Fusion Transformer (TFT) architecture optimized for market microstructure. This is not an LLM, we use transformer components tailored for finance. Here are some of the key considerations the model takes into account: 

  • multi-scale time features (LOB events, seconds, minutes, days)
  • probability calibration (post-processing of the output using class probabilities)
  • cost-sensitive loss function aligned to PnL 

Technical paper on TFT: “Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting”, B. Lim et al (available at https://arxiv.org/abs/1912.09363).

3. Benchmarks

3.1 Training, Validation, and Test periods

We are retraining our models every 3 months, and evaluating performance out-of-sample. 

  • Training: Several months of historical data that we use to train the model. Every 3 months we append another 3 months of data to the end of this data set. 
  • Validation: A 3-month period (rolling forward every 3 months), not seen during training, that we use to pick the best performing model.
  • Test: A 3-month period (rolling forward every 3 months), not seen during training or validation, that we use to report performance. 

3.2 Trading Strategy Used for Benchmarking

Quick review of the trading strategy used:

  • Every 5 minutes (between 09:45 and 15:30 ET) we decide to take a long, short or no position using 1/70 of our starting portfolio for the day (there are 70 possible openings per day). 
  • Each long/short position is then split into 5 parts and executed on each minute for the next 5 minutes following the decision. There is no sizing adjustment. 
  • We exit all positions at the end of the day again split over five minutes (15:55–16:00 ET).
  • Costs: 1 bp round-turn assumption (ES: 1bp ~ 2 ticks = $25.0, NQ: 1bp ~ 8 ticks = $40.0).
  • Extra exchange/clearing fees not included.
  • Contract series & roll: Front-month continuous. Switch at the open T–5 trading days before expiration—stop trading the expiring contract and start trading the next

3.3 Performance Metrics

E-mini S&P 500 (ES):

Quarter Sharpe Calmar Win % Ann. Return % Ann. Vol % MDD %
2025 1.382 2.907 54.948 12.714 9.197 -4.374
25Q4 2.650 7.366 53.896 10.401 3.926 -1.412
25Q3 4.818 13.378 60.061 15.189 3.152 -1.135
25Q2 1.587 7.156 55.739 31.295 19.722 -4.374
25Q1 1.122 3.118 54.240 8.217 7.322 -2.636
2024 2.248 3.913 54.724 13.630 6.064 -3.483
24Q4 1.253 4.474 51.331 8.470 6.759 -1.893
24Q3 4.939 24.416 56.102 33.242 6.730 -1.362
24Q2 1.434 2.987 59.540 9.017 6.290 -3.018
24Q1 1.137 2.604 52.021 5.076 4.466 -1.949
2023 1.074 1.865 52.746 6.230 5.800 -3.340
23Q4 0.039 0.089 46.597 0.208 5.299 -2.343
23Q3 -0.415 -0.960 51.243 -2.066 4.978 -2.153
23Q2 3.561 12.752 60.248 16.107 4.524 -1.263
23Q1 1.320 3.679 53.257 10.533 7.977 -2.863
Figure 1: Portfolio Values and Price Changes 2025 (ES)
Figure 2: Portfolio Values and Price Changes 2024 (ES)
Figure 3: Portfolio Values and Price Changes from 2023 (ES)

E-mini Nasdaq-100 (NQ):

Quarter Sharpe Calmar Win % Ann. Return % Ann. Vol % MDD %
2025 2.337 6.124 54.955 23.481 10.046 -3.834
25Q4 2.504 12.332 51.687 12.360 4.937 -1.002
25Q3 4.895 18.561 61.682 18.165 3.711 -0.979
25Q2 2.533 12.697 50.550 48.683 19.218 -3.834
25Q1 1.469 4.002 54.008 14.280 9.718 -3.568
2024 2.205 3.686 55.591 17.551 7.959 -4.762
24Q4 1.884 6.384 53.597 15.165 8.048 -2.376
24Q3 2.595 7.973 55.580 25.497 9.824 -3.198
24Q2 0.851 1.437 57.455 6.432 7.561 -4.476
24Q1 2.956 10.984 55.680 17.799 6.021 -1.621
2023 0.913 2.002 51.508 6.465 7.080 -3.229
23Q4 0.212 0.424 49.291 0.989 4.675 -2.334
23Q3 0.179 0.377 49.677 1.217 6.803 -3.229
23Q2 2.383 8.097 55.365 14.167 5.946 -1.750
23Q1 1.225 4.282 51.915 12.771 10.425 -2.983
Figure 4: Portfolio Values and Price Changes 2025 (NQ)
Figure 5: Portfolio Values and Price Changes 2024 (NQ)
Figure 6: Portfolio Values and Price Changes 2023 (NQ)

Notes:

  • Annualization: daily conversion to annual using 252. 
  • Sharpe uses daily returns.
  • Ann. Vol: Annualized Volatility.
  • MDD: Max Drawdown.
  • Rebasing: curves rebased to 100 at period start.

4. Historical Prediction Data

The predictions generated by the model are available in the following files for download to anyone interested in using them in their own test harness. There is one file per model. 

Here are the column names and meaning in those files: 

  • date_time: Timestamp (in Eastern Standard Time) of each observation (1-minute cadence) | (datetime string)
  • predictions: Model-predicted class (0 for Down, 1 for Stable, 2 for Up) | (int)
  • actual_labels: Realized (actual) class label for the outcome (0/1/2) | (int)

5. Next steps 

  1. Download past predictions from the links in Section 4 and test with your own test harness and strategies.

  2. Review out-of-sample performance by consuming the predictions in real-time using our API. This requires signing-up for our “Professional” tier which includes a 1-month free trial.

  3. Go beyond pre-trained signals by tailoring a signal to your specific needs. Customize by symbol, target variable, time horizon, neutral band, entry cadence and risk rules. 

6. Contact

Yianni Gamvros
Co-founder & CEO
yianni [at] quantumsignals.ai
+ 1 202 390 4935

Disclaimers: 

  • Futures trading involves substantial risk of loss and is not suitable for all investors. 
  • Hypothetical/simulated results do not represent actual trading, may under- or over-state market impacts (e.g., liquidity), and reflect hindsight. 
  • No representation that any account will achieve similar results. 
  • Past performance is not necessarily indicative of future results. 
  • Quantum Signals is not an NFA member
Share with your network.
In this article

Executive Summary