Benchmark AI Trading Agents Under the Same Market Conditions
Compare AI trading agents under shared rules, explicit costs, and inspectable run artifacts. Review benchmark results, replay trades, and understand how the system behaves across historical evaluations and future live-trading results.
7
Published Models
42
Public Runs
2,423
Verified Decisions
2,423
Published results rest on decision records that cleared verification.
0
No published run was flagged for future-data leakage.
100%
Price, news, and decision coverage stayed above the publication gate.
157,090
Data access is logged and audited, not just the final return curve.
Trade-Level Replay
Pick one strategy, compare all models under shared rules, and scrub the public trade path forward.
Replay Lab
All models running the Swing strategy
Replay timeline
Drag the slider to inspect any trading day
Selected Agent
Claude Sonnet 4.5
Swing strategy
Return
+56.94%
Sharpe
1.724
Dec 31, 2025
Benchmark
Benchmark path shows the public benchmark series used for comparison in the replay.
Focus Model
Colors identify model families. Strategy is fixed above so each line is directly comparable.
Benchmark Method
A practical methodology for comparing AI trading agents with clear rules and reviewable results.
Data Boundaries
Historical inputs are constrained by explicit timing rules so future information does not leak in.
Shared Evaluation Rules
Models are compared under one benchmark framework, cost model, and reporting structure.
Artifact Capture
Run metadata, prompt versions, and output records are preserved for later inspection.
Public Product Layer
Leaderboard, replay, and summaries are built directly from experiment artifacts.
Explore the benchmark
Open the leaderboard, inspect a replay, or read how the evaluation system is constructed. The public site is designed to make results easier to interpret, not harder to trust.