Why There’s Still an AI Gap in Trading

October 22, 2025
|
Product

Over the last 18 months, in conversations with quants and PMs, we kept hearing the same refrain: AI hasn’t delivered reliable trading signals. Many teams told us they’d “tried AI” and couldn’t build a model that consistently predicts price trends well enough to run in systematic strategies. We also heard plenty of comfort with tried‑and‑tested classical/legacy ML which are reliable, explainable, and good enough for most mandates. We covered the landscape of current computational models begin used in practice in a previous post. With early attempts failing and a low appetite to bet on anything “new” that might not clear governance, latency, or cost, many teams walk away from more elaborate AI efforts altogether. One caveat: AI is a complement to sound market insight, not a substitute—it amplifies well‑understood edges rather than creating them from scratch.

Two reasons this perception persists

1) What changed hasn’t been fully tried.
A lot of prior attempts predate the current wave of transformer architectures (and modern RL). Transformers, the architecture behind today’s LLMs, are built to detect patterns across multiple time scales and to condition on context. Text isn’t the only sequence domain, market data is one too.

2) It’s a pipeline problem, not a model swap.
Building an AI system that survives production constraints is harder than running momentum/imbalance screens, fitting autoregressions, or training trees. It demands specialized engineering across data, modeling, and operations. Many teams haven’t staffed or budgeted for that work.

Why now: modern transformers

Today’s transformer models handle long‑ and short‑range dependencies and conditional context far better than the setups many teams experimented with years ago. They can fuse multiple signals, keep context across scales (seconds, minutes, hours, days), and condition on metadata (e.g., earnings‑day flags, month‑end, futures expiry) with less hand‑crafting. That does not mean “drop in a chatbot and trade”, it means we now have the right building blocks for finance‑native AI models.

What it takes to make it work

Making AI work in finance isn’t a single model choice. It is a sequence of correct decisions.

Pre-processing.
Decide whether to use event bars or time bars. Define labels such as directional moves or quantiles with strict point-in-time discipline to avoid leakage. Choose between computed and raw features intentionally. Specify how many series you will include and set their sampling frequency and time scale appropriately.

Aggregation & normalization.
Align venues and symbols with consistent timestamps. Resample in a reproducible way and apply consistent scaling across features. Enforce point-in-time data so the model never “sees the future.”

Architecture & training.
Choose a lookback window and forecast horizon that match the desk’s decision cycle. Set sequence length and attention horizon accordingly. Select a loss function that reflects trading objectives. Allocate tuning budgets, apply regularization, and validate with forward-chaining.

Backtesting.
Test models across multi-year periods with clear, rolling delineations for training, validation, and test sets. Define how candidate models are compared and selected. Establish promotion criteria that survive transaction costs, slippage, and capacity limits.

Post-processing.
Calibrate model outputs into actionable scores or probabilities. Define thresholds and position-sizing rules. Add risk overlays and explicit slippage and latency budgets. Roll out with canaries and kill switches, and monitor continuously for drift.

Real-time inference.
Deploy the approved model or model set behind a low-latency service. Stream live data reliably into the pipeline and harden the system for failures, retries, and out-of-sequence messages. Validate that live outputs match backtested behavior under realistic load.

Retraining & monitoring.
Decide on a retraining cadence—overnight, weekly, or monthly—based on drift and operational cost. Specify whether retraining starts from scratch or reuses portions of the model. Track data and concept drift, trigger alerts, and document changes so production behavior remains explainable over time.

There are many valid ways to set these parameters. The right choices depend on the trading decisions your model supports and the operational constraints you must satisfy. Older approaches (e.g., autocorrelation tests, decision trees) are often more forgiving and can feel “out of the box” with standard ML libraries. The hard truth is that AI requires specialized engineering across data, systems, and evaluation, and many teams lack the resources, skills, budgets, or patience to push through.

Reality check

If you build this in‑house, expect a multi‑quarter effort, significant spend on infrastructure and data (often mid‑six figures annually), and a sizable AI, DevOps, and full stack engineering team to maintain the pipeline over time.