Someone shows you a backtest. The equity curve goes up and to the right. The annual returns look strong. The drawdowns are manageable. It looks like evidence.
It isn’t. Not yet.
A backtest is a simulation — a set of rules applied to historical price data to show what would have happened if you’d followed those rules in the past. The output is a hypothetical track record, not a real one. That distinction matters enormously, because there are at least five systematic ways a backtest can produce results that look credible and aren’t. Every serious investor should be able to recognize them.
A backtest doesn’t tell you a strategy works.
It tells you the strategy would have worked in the past, on that data, with those parameters.
That’s a much weaker claim — and the distance between those two things is where most investing decisions go wrong. The job of a well-designed backtest is not to prove the strategy works. It’s to give the strategy every honest opportunity to fail before real money is at stake.
The Five Ways a Backtest Can Lie
These aren’t exotic edge cases. They’re the most common reasons that a strategy performs brilliantly in simulation and fails in practice. Each one has its own post in the Concepts library. What follows is the version you need before reading any of them.
| Failure Mode | What It Means | The Question to Ask |
|---|---|---|
| Survivorship Bias | The backtest only included companies that still exist today. The ones that went bankrupt, got delisted, or quietly failed aren’t in the data. The universe is pre-filtered to winners before the test even begins. | Does the historical dataset include companies that no longer exist? |
| Look-Ahead Bias | The strategy used information that wouldn’t have been available at the time the trade was made — earnings figures released after market close, revised economic data, end-of-day prices used to generate intraday signals. The strategy knew things it couldn’t have known. | Could every data point used have been known at the exact moment the trade was executed? |
| Overfitting | The strategy was tuned so precisely to historical data that it’s essentially memorized the past rather than found a pattern in it. Change the parameters slightly and the results collapse. A strategy that only works in one exact configuration hasn’t found edge — it’s found a coincidence. | What happens to the results if the parameters change by 10% in either direction? |
| Ignoring Transaction Costs | Commissions, bid-ask spreads, and slippage are real costs that compound quickly for strategies that trade frequently. Most backtests assume a perfect fill at the closing price. Real markets don’t work that way — especially when size or speed is involved. | Were realistic execution costs applied to every trade in the simulation? |
| Data Snooping | The researcher tested hundreds of parameter combinations, identified the one that looked best, and presented it as if that configuration was chosen in advance. The more combinations you test, the more likely one will look good by pure chance. This is the most common failure mode — and the least acknowledged. | How many parameter combinations were tested before arriving at these results? |
What a Trustworthy Backtest Includes
No backtest eliminates all uncertainty. But a well-constructed one makes the uncertainty visible rather than hiding it. When evaluating any strategy — including every Lab we publish here — look for three things.
Out-of-sample results. The backtest should be split: one portion used to develop the strategy, a separate portion held back and tested afterward. If the results only hold on the data the strategy was built on, that’s not validation — it’s circular reasoning.
Realistic transaction costs. Every trade should carry a friction estimate that reflects what execution actually costs — not a theoretical best-case fill. Strategies that look profitable before costs and unprofitable after them have no edge. They have overhead.
Parameter stability. The results should hold up if the inputs shift slightly. If a two-day difference in a moving average window changes the outcome from profitable to catastrophic, the strategy hasn’t found a real pattern. It’s found a local artifact in one specific dataset.
Consider the difference between a strategy that returns 18% annually when tested on the same data it was built from, and 9% when tested on data it’s never seen. That’s not a failure — that’s an honest result. The degradation is expected and the strategy may still be worth pursuing. Now consider a strategy that returns 22% on familiar data and loses money on new data. That’s not a strategy. That’s a historical artifact dressed up as one.
The gap between those two scenarios is what out-of-sample testing reveals. Without it, you have no way of knowing which one you’re looking at.
These are the minimum conditions for taking a backtest seriously. They don’t guarantee the strategy will work going forward — nothing does. But their absence is a reliable signal that the results aren’t worth trusting.