How to Build & Backtest an AI Trading Agent with TradingAgents

What TradingAgents is

TradingAgents, from Tauric Research, simulates a real trading desk: separate LLM agents act as fundamental, sentiment, news and technical analysts, then trader and risk-management agents debate their way to a decision. For a builder, its value is the architecture — you can read exactly how specialised agents, tool access and a debate loop fit together, which is far more instructive than a single prompt.

1. Install it

Clone the repo, create an environment, and set the API keys it needs (an LLM key plus the data/news keys listed in its README):

setup — terminal

git clone https://github.com/TauricResearch/TradingAgents.git
cd TradingAgents
python -m venv .venv && source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -r requirements.txt

# set the keys named in the repo README (LLM + data providers), e.g.:
export ANTHROPIC_API_KEY=sk-ant-...
export FINNHUB_API_KEY=...

2. Get one decision

The framework exposes a graph you "propagate" for a ticker on a given date. The pattern looks like this (check the current README — the API does evolve):

decide.py

from tradingagents.graph.trading_graph import TradingAgentsGraph
from tradingagents.default_config import DEFAULT_CONFIG

config = DEFAULT_CONFIG.copy()
config["llm_provider"] = "anthropic"
config["deep_think_llm"] = "claude-sonnet-4-6"

ta = TradingAgentsGraph(debug=True, config=config)
state, decision = ta.propagate("NVDA", "2024-05-10")
print(decision)   # the desk's final call + the agents' reasoning

3. Turn it into a backtest

A single decision proves nothing. To backtest, walk the agent forward one trading day at a time and record what a small, fixed-size trade would have done. The key discipline: only ever pass the agent data available on that date — feeding future information is the classic look-ahead bias that makes a strategy look brilliant until it meets real money.

backtest_loop.py (sketch)

import pandas as pd

prices = pd.read_csv("NVDA.csv", index_col="Date", parse_dates=True)
cash, shares = 10_000.0, 0
for date in prices.loc["2024-01-01":"2024-06-01"].index:
    _, decision = ta.propagate("NVDA", date.strftime("%Y-%m-%d"))
    px = prices.loc[date, "Close"]
    if decision.get("action") == "BUY" and cash >= px:
        shares += 1; cash -= px
    elif decision.get("action") == "SELL" and shares > 0:
        shares -= 1; cash += px

equity = cash + shares * prices["Close"].iloc[-1]
print("final equity:", round(equity, 2))

Then judge it on risk-adjusted terms — Sharpe ratio and max drawdown — not just the headline return, and compare against simply buying and holding. Our backtesting guide covers the traps in depth.

Honest caveats

Cost & latency: every decision is several LLM calls. Backtesting hundreds of days gets slow and isn't free — start with a short window.
Reproducibility: LLM outputs vary. Pin the model, lower temperature, and run multiple seeds.
It's a research tool: treat outputs as study material, paper-trade before anything else, and never deploy capital you can't lose.

Disclaimer: Educational only — not financial advice. Code is illustrative; verify APIs against the project's current README. Automated trading is risky; backtest and paper-trade before risking real capital.