How to Build & Backtest a Reinforcement-Learning Trading Agent with FinRL

A different kind of agent

The LLM agents (TradingAgents, ai-hedge-fund) reason in language. FinRL agents learn: you frame trading as a reinforcement-learning problem — state (market features + your position), actions (buy/sell/hold and how much), reward (change in portfolio value) — and a deep-RL algorithm like PPO or A2C learns a policy by trial and error over historical data. It's more ML-heavy, but the backtest is built into the workflow.

1. Install it

FinRL is Python and leans on the stable-baselines3 RL library. Most people work from its example notebooks:

setup — terminal

python -m venv .venv && source .venv/bin/activate
pip install finrl stable-baselines3 pandas

# or clone for the full tutorial notebooks:
git clone https://github.com/AI4Finance-Foundation/FinRL.git

2. The train → backtest pipeline

The flow is always the same: turn price data into a feature dataframe, split it into train and trade (test) periods, build a trading environment for each, train an agent on the train period, then run it on the unseen trade period. The pattern (follow the current FinRL tutorials for exact imports — they shift between versions):

finrl_pipeline.py (representative)

from finrl.meta.env_stock_trading.env_stocktrading import StockTradingEnv
from finrl.agents.stablebaselines3.models import DRLAgent

# train_df / trade_df: feature dataframes split by date (no overlap)
env_train = StockTradingEnv(df=train_df, **env_kwargs)
agent = DRLAgent(env=env_train)

model = agent.get_model("ppo")
trained = agent.train_model(model=model, tb_log_name="ppo", total_timesteps=50_000)

# backtest on the UNSEEN trade period:
env_trade = StockTradingEnv(df=trade_df, **env_kwargs)
account_value, actions = DRLAgent.DRL_prediction(model=trained, environment=env_trade)
print(account_value.tail())

3. Judge it honestly

The account_value series from the trade period is your backtest. Compute Sharpe ratio and max drawdown on it and compare to a buy-and-hold baseline. RL is especially prone to overfitting the training period, so the gap between train and trade performance is the number that matters most — a model that's brilliant in-sample and mediocre out-of-sample has learned noise.

Honest caveats

Steeper learning curve: you need basic RL literacy (environments, rewards, PPO/A2C) — more than the LLM repos demand.
API churn: FinRL's modules move between releases; treat code as a map and follow the version's own notebooks.
Overfitting is the default failure mode: always test on out-of-sample data, and consider walk-forward validation before believing any result.
Going live is a separate step (FinRL-Trading / FinRL-X add broker integration) — paper-trade first, always.

Disclaimer: Educational only — not financial advice. Code is illustrative; verify against FinRL's current tutorials. Automated trading is risky; backtest out-of-sample and paper-trade before risking real capital.