Training Episodes
CIPHER-min-3 has completed over 90,000 training episodes across BTC, ETH, SOL, and XRP. Each episode is a random 500-candle window (~21 days) from the training split. The agent rotates between symbols every episode.
Total Episodes
90,000+
Symbols Trained
4
Val Win Rate
75-84%
Cumulative PnL
+$856k
Win Rate Over Training
Validation runs every 100 episodes across all 4 symbols using deterministic predictions. Training win rate uses exploration (stochastic) — the gap between train and validation is expected and healthy.
Win rate % — Training (exploration) vs Validation (deterministic)
Uses exploration (deterministic=False). Lower WR is expected — the agent tries new strategies.
Uses best actions only (deterministic=True). Tested on held-out 15% data across all 4 symbols.
Cumulative P&L Trajectory
The model dropped to -$400k during early exploration (first 30k episodes), then recovered as the policy improved. The inflection point around episode 50k marks where the agent consistently found profitable patterns.
Cumulative P&L ($k) — $10,000 starting balance per episode
Per-Symbol Test Results
Evaluated on the held-out 15% test split — data the model never saw during training or validation. All trades were executed with deterministic predictions.
Return % and Win Rate % per symbol — Test split evaluation
| Symbol | Return | Win Rate | Trades | Sharpe |
|---|---|---|---|---|
| BTC/USDT | +1.1% | 75% | 8 | 2.54 |
| ETH/USDT | +3.8% | 62.5% | 8 | 4.18 |
| SOL/USDT | +4.9% | 75% | 8 | 5.25 |
| XRP/USDT | +4.6% | 75% | 8 | 4.88 |
| TOTAL | +3.6% | 71.9% | 32 | 4.26 |