Training Episodes

CIPHER-min-3 has completed over 90,000 training episodes across BTC, ETH, SOL, and XRP. Each episode is a random 500-candle window (~21 days) from the training split. The agent rotates between symbols every episode.

Total Episodes

90,000+

Symbols Trained

Val Win Rate

75-84%

Cumulative PnL

+$856k

Win Rate Over Training

Validation runs every 100 episodes across all 4 symbols using deterministic predictions. Training win rate uses exploration (stochastic) — the gap between train and validation is expected and healthy.

Win rate % — Training (exploration) vs Validation (deterministic)

Training

Uses exploration (deterministic=False). Lower WR is expected — the agent tries new strategies.

Validation

Uses best actions only (deterministic=True). Tested on held-out 15% data across all 4 symbols.

Cumulative P&L Trajectory

The model dropped to -$400k during early exploration (first 30k episodes), then recovered as the policy improved. The inflection point around episode 50k marks where the agent consistently found profitable patterns.

Cumulative P&L ($k) — $10,000 starting balance per episode

Per-Symbol Test Results

Evaluated on the held-out 15% test split — data the model never saw during training or validation. All trades were executed with deterministic predictions.

Return % and Win Rate % per symbol — Test split evaluation

Symbol	Return	Win Rate	Trades	Sharpe
BTC/USDT	+1.1%	75%	8	2.54
ETH/USDT	+3.8%	62.5%	8	4.18
SOL/USDT	+4.9%	75%	8	5.25
XRP/USDT	+4.6%	75%	8	4.88
TOTAL	+3.6%	71.9%	32	4.26

Training Configuration

AlgorithmPPO (Proximal Policy Optimization)

Episode Length500 candles (~21 days at 1h)

Window SelectionRandom start within training split

Symbol RotationOne symbol per episode, cycling BTC/ETH/SOL/XRP

Gamma0.95 (short-term focus for crypto)

Entropy Coefficient0.02 (stable exploration)

Learning Rate1e-4

Validation IntervalEvery 100 episodes, all 4 symbols, deterministic

Auto-SaveModel version snapshot every 500 episodes