Decay or Drawdown? A Statistical Test for When to Kill Your EA

Every live EA operator eventually stares at the same red equity curve and asks the same unanswerable-feeling question: is this thing broken, or is it just having a bad month? The instinct is to treat it as a judgment call — a gut read on whether to flip the kill switch or hold the line. It isn't. Your backtest already computed the answer, and it isn't the reported max drawdown. The number that matters is the 95th-percentile Monte Carlo drawdown — the line beyond which there is only a 5% probability your strategy should ever travel if its edge is still intact. Cross it, and the data, not your nerves, is telling you something has changed.

Your Backtest Already Drew the Alarm Line

The mistake almost every retail algo trader makes is treating the Strategy Tester's reported max drawdown as the worst case. It is not the worst case — it is one case, the single drawdown that happened to occur in the exact order your historical trades arrived. Reorder those same trades and you get a different, often far deeper, low-water mark.

That reordering is precisely what Monte Carlo simulation does. As ErgodicLabs frames it:

Monte Carlo simulation reshuffles the sequence of your backtest trades thousands of times to determine whether the strategy's performance depends on the specific order trades occurred.

Run enough permutations and you no longer have a single drawdown figure — you have a distribution, and from that distribution you can read confidence bands: the 50th, 90th, 95th, and 99th percentile drawdowns. ErgodicLabs computes these from up to 50,000 trade-sequence permutations, and the framing that makes them actionable comes from StrategyQuant:

At a 95% confidence level, there is only 5% probability that drawdown will be worse than the calculated percentile value.

Read that backwards and it becomes a live diagnostic. If your account blows through the 95th-percentile band, you are in the 5% tail — possible if the edge holds, but improbable enough to demand attention.

The Two Numbers That Are Never the Same

The gap between the reported drawdown and the percentile drawdown is not a rounding error — it is routinely a multiple. PickMyTrade's 2026 guide reports that Monte Carlo simulations regularly reveal max drawdowns 3.1x larger than the backtest's headline figure. Two documented cases make the scale concrete:

A backtest showing an 18% max drawdown carried a 95th-percentile Monte Carlo drawdown of 52% — almost three times deeper (PickMyTrade, 2026).
A backtest drawdown of $1,663.90 expanded to $5,195.17 at the 95% confidence level — over 3x — a signature of luck dependency rather than durable edge (StrategyQuant / ErgodicLabs analysis).

The practical consequence is blunt: a trader who set mental and capital limits around the 18% figure would panic-close at the first 25% drawdown, well inside the strategy's statistically normal range. A trader anchored to the 52% band knows that drawdown, however ugly, is still consistent with the system working. The percentile band converts an emotional threshold into a quantified one — and that is the entire game.

Key Risk for EA Developers: If your live limit is set to the backtest's reported max drawdown, you have built a stop that fires inside the strategy's own confidence interval. You will kill working systems during normal variance and keep broken ones that haven't yet reached an arbitrarily shallow line. Set thresholds to the percentile band, not the single historical path.

Test One: The Confidence-Band Check

The simplest of the three diagnostics needs no live computation — just a threshold set in advance. The heuristic, endorsed across PickMyTrade and BuildAlpha, is the 5th-percentile trigger: if your live equity curve falls below the 5th percentile of the Monte Carlo-simulated curves — that is, worse than 95% of randomized trade orderings of your own validated system — treat it as the practical early-warning signal that performance is no longer consistent with the historical edge.

Operationally:

Export your validated backtest's closed trades and run a permutation simulation (ErgodicLabs, StrategyQuant, or a native MT5 routine — covered below).
Record the 5th-percentile equity path and the 95th-percentile drawdown depth.
Overlay your live equity curve against that band as trades close.

While the curve stays inside the band, the rational default is to hold — you are watching expected variance, not failure. A breach is not proof of death, but it is the point at which continuing to run capital becomes a deliberate bet against your own statistics. The strength of this test is that it requires zero ongoing math; the weakness is that it is a single-point alarm and says nothing about how performance is degrading.

Test Two: The Z-Score of Rolling Sharpe

The confidence band watches equity; the second test watches the edge itself. The academic anchor here is the 2017 Capital Fund Management paper by Rej, Seager, and Bouchaud — "You Are in a Drawdown. When Should You Start Worrying?" — which derives the expected length and depth of drawdowns for an upward-drifting strategy conditioned on its Sharpe ratio. Their central, uncomfortable finding:

Both managers and investors tend to underestimate the length and depth of drawdowns consistent with the Sharpe ratio of the underlying strategy.

In other words, the drawdown that feels like decay is, for most strategies, exactly what their Sharpe ratio predicts. The diagnostic is to compute a rolling Sharpe over a recent window and express it as a z-score against the backtest's Sharpe distribution. A rolling figure one standard deviation soft is noise; a sustained reading several deviations below the historical mean is the statistical fingerprint of a degrading edge rather than an unlucky sequence.

This is also where an external cross-check earns its place. Genuine decay often coincides with a structural regime shift in the instruments the EA trades — a volatility regime change, a liquidity shift, a central-bank pivot — not with random variance. Traders can monitor these levels in real time using TradingView, which provides the multi-timeframe and volatility context needed to assess whether a drawdown coincides with a regime break — a separate but corroborating signal of structural change. A soft rolling Sharpe and an obvious regime shift is a far stronger case for intervention than either alone.

Test Three: The SPRT Sequential Monitor

The third test is the most powerful and the least known among retail algo traders: Wald's Sequential Probability Ratio Test (SPRT, 1945). Where a fixed-sample test waits for a predetermined number of trades before rendering a verdict, SPRT evaluates the evidence after every trade and reaches a decision as soon as the accumulated likelihood crosses a boundary — typically with far fewer observations than a fixed test of equivalent power.

Framed for strategy monitoring, you pit two hypotheses against each other as trades arrive:

// SPRT decay monitor (pseudo-logic)
// H0: live trades drawn from the ORIGINAL Sharpe distribution (edge intact)
// H1: live trades drawn from a DEGRADED Sharpe distribution (edge gone)
//
// After each closed trade, update the log-likelihood ratio:
//   LLR += log( P(trade | H1) / P(trade | H0) )
//
// Decision boundaries from chosen error rates (alpha, beta):
//   if LLR >= log((1-beta)/alpha):  accept H1  -> flag decay
//   if LLR <= log(beta/(1-alpha)):  accept H0  -> keep running
//   else:                            keep collecting trades

The advantage for low-frequency EAs is decisive. A trader running 6–10 trades a week cannot wait 200 trades for a fixed-sample verdict — by then the account damage is done. SPRT reaches a statistically defensible decision with the smallest sample the evidence allows, which is exactly what a live kill-switch decision demands. The trade-off is complexity: you must specify the degraded-Sharpe alternative (H1) and your tolerance for false alarms (alpha) and missed decay (beta) in advance, and a poorly chosen H1 makes the monitor either trigger-happy or asleep.

The Sample-Size Gate: None of This Runs on 40 Trades

Every test above shares one precondition: enough trades to make the statistics mean anything. This is where most retail Monte Carlo work quietly fails. Two thresholds matter.

The simulation itself needs scale. ErgodicLabs documents that key statistics — median return, 95th-percentile drawdown, probability of ruin — stabilize at roughly 5,000 permutations; below 1,000 the percentile bands are unreliable and will mislead you. Treat 5,000 runs as the floor, not the target.

The trade sample feeding the simulation needs its own minimum. Drawing on Cochran's formula, the rough gates are:

109 trades for 70% confidence at a 5% margin of error — the bare entry point;
200+ trades for meaningful percentile-band stability;
1,000+ trades for strong statistical significance (StrategyQuant, QuantifiedStrategies).

A strategy with 40 live trades cannot be diagnosed by any of these methods — the confidence intervals are too wide to distinguish decay from noise, and any "signal" is an artifact of small samples. For low-frequency EAs this is a genuine constraint: you may need months of live data before the decay tests have any authority. Until then, the honest answer to "is it broken?" is "there is not yet enough evidence to say," and acting on a 30-trade hunch is its own form of overfitting.

Building the Protocol Into Your EA

The payoff of this framework is that it replaces an emotional decision with a written protocol — one you commit to before the drawdown arrives, when you can still think clearly. The validation evidence supports the discipline: strategies that pass rigorous Monte Carlo simulation (5,000+ runs) show, per the PickMyTrade 2026 guide aggregating BuildAlpha and BacktestBase data, 30–50% lower live trading failure rates.

You no longer need to leave the platform to run the simulation, either. MQL5 published a dedicated implementation — "Stress Testing Trade Sequences with Monte Carlo in MQL5" (mql5.com article 22291) — providing native EA-level code for running permutation tests directly inside the MT5 Strategy Tester, so the confidence bands can be generated and even monitored from within the same environment the EA trades in.

A workable decision protocol, layered by trade frequency and statistical comfort:

Pre-compute the bands. Before going live, run 5,000+ permutations on a sample of 200+ trades and record the 5th-percentile equity path and 95th-percentile drawdown.
Set the passive alarm. Use the confidence-band check as your always-on tripwire — a breach of the 5th-percentile path is the signal to escalate, not to panic-close.
Escalate to the edge tests. On a breach, run the rolling-Sharpe z-score and, if your trade frequency is low, the SPRT monitor, before deciding.
Corroborate with regime. Check whether the drawdown coincides with a structural shift in the traded instruments rather than pure variance.

The deepest lesson from the CFM work is that the drawdown you are agonizing over is, statistically, probably normal — most operators pull the trigger inside their own confidence interval and kill working systems. The framework's value is symmetrical: it tells you when to hold through a scary-but-normal drawdown, and it tells you when an unremarkable-looking one has quietly crossed into the tail. Either way, the alarm line was in your backtest the whole time. The discipline is reading it before the equity curve forces the question.

Ready to build and test your own strategies?

FX Strategy Analyzer's EA Analyzer Pro helps you stress-test MT4/MT5 strategies across historical regimes — built by traders, for traders.

Open EA Analyzer Pro →

Charting Tool

Track live market conditions alongside your EA performance. TradingView gives you professional-grade charts and real-time data — new subscribers receive $15 toward their first plan.

Open TradingView Charts →

EA & Strategy Analysis

Most Traders Underestimate Drawdown — And It's Destroying Their Accounts

The foundational drawdown misunderstanding — the natural precursor to setting limits by confidence band rather than backtest max.

EA & Strategy Analysis

The Difference Between a Good Strategy and a Lucky Backtest

Distinguishing genuine edge from luck in backtests — the offline complement to the live-monitoring decay tests here.

EA & Strategy Analysis

Why Do So Many Backtests Fail in Live Trading?

The structural reasons backtests overstate performance — context for why Monte Carlo confidence bands are necessary.

EA & Strategy Analysis

Walk-Forward Optimization Best Practices for MT5

Walk-forward and Monte Carlo are the two most rigorous validation tools — a natural pairing for serious EA validation.

Monte Carlo Drawdown Strategy Decay Confidence Intervals SPRT EA Performance MT5 Backtesting