Preparing for the shift from fixed rules to learning agents to quantum-native decision making.

Our paper: https://arxiv.org/abs/2510.08159

Agents that rediscover known algorithms

In the paper we set up learning problems where an agent only gets a score (“reward”) and improves through trial and error—no hand-crafted solution provided. Under this setup, agents independently rediscovered:

The Quantum Fourier Transform (QFT), learning efficient, hardware-friendly circuits and achieving essentially perfect fidelity on small instances.
Grover’s search, reproducing the core steps (uniform superposition + diffusion) and hitting the known optimal success probabilities for the tested sizes, all under realistic connectivity constraints.
Optimal strategies in interactive quantum tasks, including CHSH and strong coin-flipping, matching the best-known quantum limits.

Across tasks, agents reached optimal or near-optimal performance without being told the answer, which is exactly what you want if the goal is automated discovery.

Why this matters

Automating parts of algorithm design. If agents can recover known quantum procedures from scratch, there’s a plausible path to discovering new ones where humans don’t yet have recipes.
Hardware-aware by construction. Learned routines respect constraints like shallow depth and nearest-neighbor connectivity, shortening the path from “learned” to “runnable.”
A credible scale-up story. For many practically relevant problems (e.g., portfolio selection with realistic encodings), just evaluating the reward exceeds classical simulability—exactly where on-device quantum training becomes necessary.

From algo trading, to agentic trading, to quantum-agentic trading

The wider AI world has embraced agentic systems—most visibly in code assistants that plan, call tools, and iterate autonomously. We see the same pattern coming to markets:

Algorithmic trading is today’s baseline—fixed strategies, fixed rules.
Agentic trading is next—adaptive policies that plan over horizons, interact with market simulators, and learn from feedback.
Quantum-agentic trading is the frontier—where parts of that learning and decision process run natively on quantum hardware for state spaces that are intractable to explore classically. The paper demonstrates the core capability: agents that learn optimal quantum procedures directly from rewards.

How we’re getting ready at Quantum Signals

Our priorities are pragmatic:

Now: Deliver a classical, intraday product—short-horizon signals for direction, volatility, and liquidity that quants and traders can act on immediately. This classical stack is our benchmark.
Next: Evolve toward agentic trading inside that product—letting agents reason over strategies and adapt to regimes within well-governed risk constraints.
Later: When hardware and costs justify it, plug in quantum-agentic components where they can prove improvement against the benchmark (accuracy, robustness, or compute efficiency)—the same way the paper’s agents proved themselves on canonical tasks.

Reality check

These are early-stage, small-instance demonstrations designed for clarity and reproducibility. The important part is the method: a single learning loop that runs on simulators for tiny problems and transitions unchanged to quantum devices once problem sizes cross the classical limit. That’s exactly the path we expect in finance as agentic trading matures.