How Many Trades You Really Need to Prove Your Strategy Performance Is Reliable


It happened to all of us traders. You have an idea for a strategy, you back tested it, you think you cracked the code. The performance is looking great, better than everything you had so far. You are thrilled to try your newly created strategy in the market at real time environment. You even set up a test drive on a new Lambo. But then reality hits. Your strategy performs far worse than what you expected, huge difference between back test results and live results. Now you must throw the strategy away and start all over. So what did go wrong?

Many things could have gone wrong with the strategy creation process, but there’s one thing probably caused it– you didn’t test on big enough data set (in other words, you over-fitted the data). The question is – how much is enough data? To answer this question, we need to understand the problem of overfitting and how we can solve it.

What is overfitting/ curve fitting

Overfitting (or curve fitting) is the process of constructing a mathematical function that fits the data series perfectly. In the trading world, it simply means creating a strategy that can run perfectly on a given asset on a specific period of time. What’s wrong with a perfect strategy is that when you will feed it with new time series data, it would probably perform worse, because it was created to fit a very specific scenario.

How to Avoid Overfitting

2 key factors will help you avoid overfitting – smart back testing methods and testing on data set that is large enough with enough trades that fits your rules. Let us cover the later. By the end of this article you should be able to decide by yourself how much data you really need for high certainty in your strategy future performance.

Overfitting happens often because of not “sufficiently many” training examples (trades). The more examples you have, the more confident you are that you learned approximately the “right” solution (strategy). One rule (or feature) gives a general strategy, while many rules (features) gives a very specific one. Too general and you probably don’t have an edge, too many and you will over fit the data. Hence we understand that number of rules is something to take under consideration. And indeed, when calculating the data set size, number of features is a key factor.

The hypothesis space

One more thing is hypothesis space – or in simple words, how many strategies are out there within our features space? This can be a complicated concept to grasp so we will understand it with an example. Let’s say your strategy checks:

  1. Is the price over 200 MA?
  2. Is the market cap over 700 million dollars?
  3. What is the price? Under $2 / $2 to $7 / over $7

We check 3 things and based on the answer take the trade. Note that (1), (2) and (3) are called features. One strategy can be to buy when we are over 200MA, market cap is under 700 million dollars and we don’t care about price (every feature can be “don’t care” as well).  So total number of strategies can be created over the above feature space is 3 (yes/no/don’t care) x 3 (yes/no/don’t care) x 4 (under $2/$2 to $7/over $7/don’t care) = 36. We have total 36 possible combinations for strategies.

What is the error?

One more term to know is “error”, which is the difference between strategy performance in testing compared to strategy performance in live trading. For example, if you have 50% success rate with 1:2 risk to reward on testing environment and only 40% on live trading, your error is the 10% difference between testing and live trading.

A concept – does the perfect strategy even exist?

Lastly, we want to know what a Concept means. Concept is the end goal, a working strategy. For the calculation to be correct we must assume that some working strategy is waiting to be found within the hypothesis space. Remember, the hypothesis space is all the different possibilities your set of rules can output.

So if the hypothesis space is not large enough, or makes no sense – there’s no strategy to be found, hence the calculation will be wrong. For example, if your strategy is based on only 2 features, price and market cap, and you decide whether to buy or sell based on this 2 conditions alone, probably there’s no working strategy (concept) to be found there. You need to make sure you build all the rules based on your experience in the market and your intuition, for a profitable strategy to be found within the space.

One of my strategies for example, got hypothesis space of around 4,723,920 different possibilities, and on a strategy I know got 70% chance to go my direction. So the chance of a concept to be within that space is high, hence the calculation should be correct.

Putting it all together

Now what is left for us to do is calculate the amount of data we need with the following formula:

Where, m is the amount of trades we need, |H| is the number of strategy combinations, and we want to ensure with delta% certainty that our hypothesis will have error that is less than epsilon.  So let’s apply the formula for the above strategy space where |H| = 36. If we want 98% certainty to have less than 5% error* on our strategy, we will do:

m > 1 / 0.05 x (ln 36 + ln 1/0.02) = 150 trades

So we need more than 150 trades to have more than 98% certainty to have less than 5% error with our strategy.


To avoid difference between testing and live trading you need, among other things, large enough sample size. To calculate how much is enough, you need to know how many strategies exist in your strategy space. You also need high probability for a concept to be found within that hypothesis space so the result will really mean something. For that you need some intuition and screen time, as well as watching strategies results from other traders.  The number m you get after applying the formula is the number of trades you need in your test sample size to know that the strategy results are reliable.

1 comment
Leave a Reply

Your email address will not be published. Required fields are marked *

You May Also Like