CIPHER is currently in beta · Live testing in progress · Expect changes

Training Episodes

CIPHER-min-3 has completed over 90,000 training episodes across BTC, ETH, SOL, and XRP. Each episode is a random 500-candle window (~21 days) from the training split. The agent rotates between symbols every episode.

Total Episodes

90,000+

Symbols Trained

4

Val Win Rate

75-84%

Cumulative PnL

+$856k

Win Rate Over Training

Validation runs every 100 episodes across all 4 symbols using deterministic predictions. Training win rate uses exploration (stochastic) — the gap between train and validation is expected and healthy.

Win rate % — Training (exploration) vs Validation (deterministic)

Training

Uses exploration (deterministic=False). Lower WR is expected — the agent tries new strategies.

Validation

Uses best actions only (deterministic=True). Tested on held-out 15% data across all 4 symbols.

Cumulative P&L Trajectory

The model dropped to -$400k during early exploration (first 30k episodes), then recovered as the policy improved. The inflection point around episode 50k marks where the agent consistently found profitable patterns.

Cumulative P&L ($k) — $10,000 starting balance per episode

Per-Symbol Test Results

Evaluated on the held-out 15% test split — data the model never saw during training or validation. All trades were executed with deterministic predictions.

Return % and Win Rate % per symbol — Test split evaluation

SymbolReturnWin RateTradesSharpe
BTC/USDT+1.1%75%82.54
ETH/USDT+3.8%62.5%84.18
SOL/USDT+4.9%75%85.25
XRP/USDT+4.6%75%84.88
TOTAL+3.6%71.9%324.26

Training Configuration

AlgorithmPPO (Proximal Policy Optimization)
Episode Length500 candles (~21 days at 1h)
Window SelectionRandom start within training split
Symbol RotationOne symbol per episode, cycling BTC/ETH/SOL/XRP
Gamma0.95 (short-term focus for crypto)
Entropy Coefficient0.02 (stable exploration)
Learning Rate1e-4
Validation IntervalEvery 100 episodes, all 4 symbols, deterministic
Auto-SaveModel version snapshot every 500 episodes