After seeing a lot of twitter traders PNL shorting piggies 1^{st} day, I knew there’s some edge there and I was determined to find it. But this time, I wanted to do it the right way.

November 2018 I started tracking some first day runners, and saw that approximately 74% of them were closing red. The problem was, how should I trade them? Sure, a lot of them were fading all day, but those that didn’t could squeeze 100% or more. I needed to find a way to increase certainty (74% is great but I needed more to cover intraday losses that sometimes happen even when you are right on the direction), and I needed to develop an intraday strategy to exploit that bigger picture edge.

I started with just watching the market, every day and all trading day. I wanted to see how the plays play, what patterns can I recognize and what should I explore deeper. I spent around a month watching charts of every stock that gapped up, learned how to read filings and tried to understand what fundamentals caused a stock to fade all day. After getting some intuition, I knew what I wanted to track based on the clues I found.

Now I was ready for the hard work – manually tracking all the plays. Other than the obvious features like open, close, high, low, I tracked things like volume PM, EOD volume, cash, net loss, short float, etc. Basically everything I could think of. It was manual, exhausting work – but worth every second of it.

After I got to 240 instances, I analyzed all the data. Tried to find correlation between features to the end result. Tried to answer questions like “does SSR affects the end result?” and “what’s the average pop in the morning and is it correlated with other feature in the sheet”? (Hint: no, yes – but test it by yourself for your strategy). The manual tracking and analysis allowed me to find a lot of interesting correlations in the data – things I could trade off of.

Now I knew exactly which parameters matter for the strategy and which doesn’t. One thing I will say here is that I was able to disprove a lot of twitter theories from the biggest “Gurus” out there. That emphasises the fact that you should trust no one but yourself. Taking inspiration from others is ok, but you must prove what they say is right with data, as a lot of them will say things based on irrational biases and wrong intuition, and not cold data.

This is where things got really interesting. I downloaded all first day gap ups for the past 5 years via API, concentrated on the companies with bad filings and dilutions. I had more than 1000 of old plays with the features I knew mean something based on the manual tracking.

With all this data I could develop models (formulas) for pop size in the morning, pre-market trading, volume forecast, time of high of day and a lot more. From the default 74% to guess the direction correctly in the general case, I could get to around 90% in some specific cases after optimisation of the features.

Tip: sort the data according to one of your chosen columns and try to find correlation between that column and the end result of the day (did the stock closed red or green?). For example, you can sort according to pre market volume, and check if it affected whether or not the stock closed red. Do it for every combination you can think of, and soon enough you will know a lot more than anyone else about your strategy.

After having all the statistics, I was ready to develop the intraday strategy. I based it on the formulas I got from the tracking, as well as different correlation I found useful. The real game changer for me though was a strategy tester I built to back test the intraday strategies I came out with.

The actual intraday strategy is an ongoing process and I keep improving it every few weeks based on results. For example, although the strategy was supposed to be for shorting piggies, I found out a way to avoid shorts in some special cases like huge volume forecast. The beauty is that every time you think you see something while trading that inspire you for a new thesis, you can check in your stats and back test it to prove or disprove your thesis (or those of other traders you follow).

Few important things when creating a strategy:

- Avoid over fitting – over fitting is the process of matching a strategy to your data set way to much. The strategy will look great on your data base, but it won’t be robust. Once trading real time the results will be bad. In order to avoid over fitting try to use less rules, keep the strategy simple and build your model on just 50% of your data, while testing on the other 50%. If your results are the same for both chunks of data – your strategy is probably robust (assuming enough data points).
- It’s very interesting to see that all different intraday patterns I found are somewhere between the scale of 70% SR with 1:1 RR to 45% SR with 1:3 RR, assuming no tape reading or discretionary thinking involved. Any thing extremely better should make you suspicious of the strategy you found. Sometimes great results just mean there are mistakes in the strategy assumptions, which happened to me many times.
- Make the testing process as automatic as you can. The ability to check any idea I got in a matter of few minutes with my strategy tester and the data baes I built is a huge edge.

After you got the backtested results, you would like to measure it and make it comparable to other strategies. I use a few simple variables to measure my strategy:

Win probability :

` ((number of winning trades / total number of trades) - 1) x 100`

Win : Loss relation :

` avg. win / avg. loss`

Profit factor :

` total profit / -(total loss)`

After getting the above stats from my strategy tester, I use the EquityCurveSimulator website to see what are the expected results. For example, here are the results output I get for one of my backtested strategies.

I used 45% win probability and 1 : 3.26 win : loss relation which means that when I lose, I lose $1 and when I win, I win $3.26 on average, and that I win 45% of the times.

Now the simulator creates 20 different possible PNL curves that helps me know what I should expect from the strategy. The avg. performance for 100 trades is 159%, while avg. max drawdown is 6.5%. I should also expect maximum of 11 consecutive losses.

A bad simulation would be something like the following:

We can see that average performance is 13% after 100 trades, but we can definitely finish the period red as a lot of the PNL curves are going down.

So now I know when trading real time that if I get more than -6.5% drawdown on my account trading that strategy, or more than 11 losing trades in a row, I should start looking into the strategy and why it doesn’t work as expected.

The above PNL simulator will also help you prioritise strategies, as the average performance stat will help you understand which strategy of yours got the best expectancy.

Watch the market every day, get ideas over the strategy you would like to attack. Track everything you can think of manually and automatically. See if you got an edge and then trade it. You can get ideas from twitter but trust only what you can prove with large amounts of data. For calculation of how much data you need for your strategy to be reliable, check out the following post.

]]>73% of stocks tends to squeeze less than 20% from open. We can see in the histogram that most of the stocks will squeeze only 5% from open. But despite the temptation to short into the 5% mark frontside, we still have a lot of instances that go higher, so this is not a strategy I use.

We can also see a long tail on the right side for all the runners, stocks that run in the morning or all day for more than 30%, and can get to hundreds of percents. Those are the instances that make us use our stops.

An interesting thing to check is how price react to pre market high. How many times will it go above it? what % of it will it cover?

In the following histogram, the calculation is HOD / PMH

We can see that most instances will cover 90%-95% of pre market high, and most of them will go up to 15% above it. That’s why PMH is a great reflection point to watch when shorting.

Regarding low from open, most of the stocks will go -10% to -15% from open. 64% of the instances will go more than -10% from open.

In small caps space, we know that 74% of the stocks that gap up will close red. Among those, most of them will squeeze 5% from open before they go down, and most of them will go down -10% from open. That’s why shorting those makes so much sense in terms of success rate and risk reward. Now go ahead and find a strategy that exploits this stats!

]]>Every strategy I have today is extremely effected by time of day. some strategies will work amazing in the morning, and some will work only after 4 hours into the trading day.

So in the following post I will present to you interesting facts about different times of day.

Timing the end of squeeze is a nerve wracking task. We always trying to time the top for great RR, but when we fail to do so the upward continuation can kill us. Time of day can come to our help, as a very interesting pattern unveil itself when plotting high of day to time of day histogram.

We can see clearly that the majority of stocks got their highs at 9:30 – 9:45. So looking for shorts against those highs seems like the right thing to do. Statistically, when shorting against 9:45 high will give you a safe level to risk off of.

For those of you starting their shorts at pre market, here is an histogram with PM included.

We can see that a lot of the tops will happen between 7:00 to 8:45, but shorting PM is risky as still the vast majority of instances reached their HODs around the open. This was an eye opening histogram for me, as I barely short PM anymore (there is a profitable edge there, but small win rate), and it inspired me to find a PM long strategy. Profits for the long strategy had to be taken around the market open of course, as this is where HOD most probably is.

Next important question is, when is the low of day? when should you cover your position or at least take some profits out? The following histogram shows when does LOD tends to be during the day (also why zombie hour is not such a good rule in my opinion).

As you can see, most LODs will be around 9:30 to 9:45. Next spike for LOD instances will be at end of day. So if he stock breaks 9:45 low it will most probably head down until EOD. Looking at this histogram got me really patient about where I want to take profits on stocks that works best.

2 things I found affecting HOD times are low amount of outstanding shares and recent reverse stock splits. When o/s is less than 3 million, the following histogram is plotted:

We can see that compared with the general case, stocks with less than 3 million o/s tend to postpone their HOD time, with less PM HOD and more HODs at 9:15 to 9:45.

Regarding reverse stock split – here is a histogram plotting all instances that had a reverse stock split within the last 50 days:

Again we see much less HODs in PM, way less HODs at 9:15 to 9:30 where most of them moved to 9:30 to 9:45. So in general we can say that low o/s or recent reverse stock split means a stronger stock that will top out a bit later than the general case.

I would recommend to play with those findings and see if different times of day means different results for your strategies. After watching the above, some aspects of my trading got a lot better and your’s can as well.

]]>Many things could have gone wrong with the strategy creation process, but there’s one thing probably caused it– you didn’t test on big enough data set (in other words, you over-fitted the data). The question is – how much is enough data? To answer this question, we need to understand the problem of overfitting and how we can solve it.

Overfitting (or curve fitting) is the process of constructing a mathematical function that fits the data series perfectly. In the trading world, it simply means creating a strategy that can run perfectly on a given asset on a specific period of time. What’s wrong with a perfect strategy is that when you will feed it with new time series data, it would probably perform worse, because it was created to fit a very specific scenario.

2 key factors will help you avoid overfitting – smart back testing methods and testing on data set that is large enough with enough trades that fits your rules. Let us cover the later. By the end of this article you should be able to decide by yourself how much data you really need for high certainty in your strategy future performance.

Overfitting happens often because of not “sufficiently many” training examples (trades). The more examples you have, the more confident you are that you learned approximately the “right” solution (strategy). One rule (or feature) gives a general strategy, while many rules (features) gives a very specific one. Too general and you probably don’t have an edge, too many and you will over fit the data. Hence we understand that number of rules is something to take under consideration. And indeed, when calculating the data set size, number of features is a key factor.

One more thing is hypothesis space – or in simple words, how many strategies are out there within our features space? This can be a complicated concept to grasp so we will understand it with an example. Let’s say your strategy checks:

- Is the price over 200 MA?
- Is the market cap over 700 million dollars?
- What is the price? Under $2 / $2 to $7 / over $7

We check 3 things and based on the answer take the trade. Note that (1), (2) and (3) are called features. One strategy can be to buy when we are over 200MA, market cap is under 700 million dollars and we don’t care about price (every feature can be “don’t care” as well). So total number of strategies can be created over the above feature space is 3 (yes/no/don’t care) x 3 (yes/no/don’t care) x 4 (under $2/$2 to $7/over $7/don’t care) = 36. We have total 36 possible combinations for strategies.

One more term to know is “error”, which is the difference between strategy performance in testing compared to strategy performance in live trading. For example, if you have 50% success rate with 1:2 risk to reward on testing environment and only 40% on live trading, your error is the 10% difference between testing and live trading.

Lastly, we want to know what a Concept means. Concept is the end goal, a working strategy. For the calculation to be correct we must assume that some working strategy is waiting to be found within the hypothesis space. Remember, the hypothesis space is all the different possibilities your set of rules can output.

So if the hypothesis space is not large enough, or makes no sense – there’s no strategy to be found, hence the calculation will be wrong. For example, if your strategy is based on only 2 features, price and market cap, and you decide whether to buy or sell based on this 2 conditions alone, probably there’s no working strategy (concept) to be found there. You need to make sure you build all the rules based on your experience in the market and your intuition, for a profitable strategy to be found within the space.

One of my strategies for example, got hypothesis space of around 4,723,920 different possibilities, and on a strategy I know got 70% chance to go my direction. So the chance of a concept to be within that space is high, hence the calculation should be correct.

Now what is left for us to do is calculate the amount of data we need with the following formula:

Where, m is the amount of trades we need, |H| is the number of strategy combinations, and we want to ensure with delta% certainty that our hypothesis will have error that is less than epsilon. So let’s apply the formula for the above strategy space where |H| = 36. If we want 98% certainty to have less than 5% error* on our strategy, we will do:

```
m > 1 / 0.05 x (ln 36 + ln 1/0.02) = 150 trades
```

So we need more than 150 trades to have more than 98% certainty to have less than 5% error with our strategy.

To avoid difference between testing and live trading you need, among other things, large enough sample size. To calculate how much is enough, you need to know how many strategies exist in your strategy space. You also need high probability for a concept to be found **within **that hypothesis space so the result will really mean something. For that you need some intuition and screen time, as well as watching strategies results from other traders. The number *m *you get after applying the formula is the number of trades you need in your test sample size to know that the strategy results are reliable.

To go over SEC website and scrape fundamental data, I use python as the programming language. For those of you who doesn’t know python at all, I will recommend to go over one of the full python courses in youtube. Learning Python will give you a huge edge in my opinion in the stock market – anywhere from getting data, analysing, backtesting and automating daily routines. In the following guide we will use BeautifulSoup and urllib.request. If you don’t know how to code in Python, there’s no point for you to continue reading.

Every stock have it’s own page in the SEC website. You can find almost all symbols in the following url:

`https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=SYMBOL&type=&dateb=&owner=exclude&count=40`

Just replace “SYMBOL” with the symbol you want to get. For example, if you want AAPL, you should simply type in your browser:

`https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=AAPL&type=&dateb=&owner=exclude&count=40 `

You can also get the company by its CIK, but it’s a bit more complicated as you need a data base that matches CIK to stock symbol.

By default you request 40 rows of fillings, but you can replace 40 (after count=) by up to 100 to get 100 rows. Also, by default you exclude form 4’s, you can include it by replacing “exclude” with “include”.

So a query for ACHV, 100 rows with form 4 included will be:

`https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=achv&type=&dateb=&owner=include&count=100`

After calling the url of the desired stock, you will want to go over all recent filings and look for 10-K/ 10-Q (or any other file name you need to scrape data from). Note that you will need to handle the case of 20-F, which is the equivalent for foreign companies.

To get the link, you will need to write something like:

```
Symbol = “TheSymbolYouWant”
url = "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=" + symbol + "&type=&dateb=&owner=exclude&start=0&count=80&output=atom"
uClient = uReq(url)
page_html = uClient.read()
uClient.close()
html = soup(page_html, 'html.parser')
entries = html.findAll("entry")
shouldContinue = True
link = ""
for entry in entries:
if shouldContinue and (entry.find("category")["term"].lower() == "10-k" or entry.find("category")["term"].lower() == "10-q" or entry.find("category")["term"].lower() == "20-f"):
firstUrl = entry.find("link")["href"]
```

The url can be found in “firstURL” variable.

Then, you will want to get the exact link of the document as can be found here:

To get it, you will need to target the table and grab the first link, as written here:

```
uClientFirstUrl = uReq(firstUrl)
page_html_firstUrl = uClientFirstUrl.read()
uClientFirstUrl.close()
htmlFirstUrl = soup(page_html_firstUrl, 'html.parser')
tds = htmlFirstUrl.findAll("table")[1].findAll("td")
foundtd = False
for td in tds:
if foundtd == True:
link = "https://www.sec.gov" + td.find("a")["href"]
foundtd = False
if "xbrl instance" in td.text.lower():
foundtd = True
shouldContinue = False
```

So the whole function, where the symbol is the input, will look like the following:

```
url = "https://www.sec.gov/cgi-bin/browse-edgar?action=getcompany&CIK=" + symbol + "&type=&dateb=&owner=exclude&start=0&count=80&output=atom"
uClient = uReq(url)
page_html = uClient.read()
uClient.close()
html = soup(page_html, 'html.parser')
entries = html.findAll("entry")
shouldContinue = True
link = ""
for entry in entries:
if shouldContinue and (entry.find("category")["term"].lower() == "10-k" or entry.find("category")["term"].lower() == "10-q" or entry.find("category")["term"].lower() == "20-f"):
firstUrl = entry.find("link")["href"]
uClientFirstUrl = uReq(firstUrl)
page_html_firstUrl = uClientFirstUrl.read()
uClientFirstUrl.close()
htmlFirstUrl = soup(page_html_firstUrl, 'html.parser')
tds = htmlFirstUrl.findAll("table")[1].findAll("td")
foundtd = False
for td in tds:
if foundtd == True:
link = "https://www.sec.gov" + td.find("a")["href"]
foundtd = False
if "xbrl instance" in td.text.lower():
foundtd = True
shouldContinue = False
return link
```

Where the link variable is the actual link you need for the report. The link should look something like the following:

`https://www.sec.gov/Archives/edgar/data/1445283/000156459016028797/pti-20160930.xml`

Now you can go ahead and start working on extracting the fundamentals you want from the filing.

The link you got in your hand now represents an xml file, with XBRL in it. XBRL is a business reporting language, which means you can find all data inside if you are using the right tags. You will need to do a little digging in order to figure out all the tags you need. Let’s see how can we grab cash of the company for example. Cash is represented by one of the following tags:

`us-gaap:CashAndCashEquivalentsAtCarryingValue`

`ifrs-full:Cash`

`us-gaap:CashCashEquivalentsRestrictedCashAndRestrictedCashEquivalents `

`us-gaap:Cash `

So to get cash, you can do something like:

```
Def getCash(url, symbol):
uClient = uReq(url)
page_html = uClient.read()
uClient.close()
xml = soup(page_html, 'xml')
cash = xml.findAll("us-gaap:CashAndCashEquivalentsAtCarryingValue")
if len(cash) == 0:
cash = xml.findAll("ifrs-full:Cash")
if len(cash) == 0:
cash = xml.findAll("us-gaap:CashCashEquivalentsRestrictedCashAndRestrictedCashEquivalents")
if len(cash) == 0:
cash = xml.findAll("us-gaap:Cash")
return cash;
```

Note that the “url” variable we send the function is the url of the xml we extracted earlier. Now the company’s cash should be returned as the “cash” variable. You can easily grab all other stats you need, simply by checking their tag names.

]]>For those of you who never heard the term, volume forecast is a way to predict end of day volume. The statistical edge is not in the prediction itself, but in following the way that prediction changes during the day. Stocks that fill their prediction too quick tend to squeeze hard and give shorts hard time, and those that slow with fulfilling their volume prediction tends to die quick and hard.

So how can you make your own version of the volume forecast?

** TL;DR **

- Get as many instances as you can of past runners.
- Choose a predictive model – linear regression / simple average.
- Calculate: (actual volume / predicted volume) for different parts of the day.
- Find correlation between the different ratios and end of day result (did the stock finish red or green?)

**Don’t have time to create your own model? get mine ****here**

I first bumped into Volume Forecast when reading AllDayFaders (twitter: @team3dstocks), then listened to Stan Gluzman (twitter: @CiocanaTrader) explaining his approach regarding VF in Chat with Traders podcast.

Volume forecast is a way of predicting volume in order to predict future price of a stock. For example, trying to predict the end of day volume according to the first 30 minutes volume.

At first, I thought there’s a precise formula that these great traders have found, one that allows them to predict future volume. So I did some coding and downloaded volume data via Interactive Broker’s API. But when watching the results I couldn’t be more disappointed. I know how to approximately predict future volume, but what can I do with that? Later on, things have clicked and I found the edge I was looking for when predicting volume.

In the following post I will share just enough to guide you in building your own model and the advantages of having such model.

First you need data. I took mine from Interactive Brokers via their API. Manually collected data is also something you can do, but you’ll probably get less instances, which means the model will be less accurate. I used 1200 instances for my model.

Make sure you have the following features:

- first 1 min volume
- first 5 min volume
- first 15 min volume
- first 30 min volume
- first 60 min volume
- EOD volume
- PM volume

Of course you can play with more intervals if you like, but I used the above.

To try and predict future volume based on past volume, I tried 2 simple methods – averaging factorial and linear regression.

The first one is simply checking what is the factorial from one data point to another, by dividing the 2 data points. For example, you can divide first 30 minutes volume by first 5 minutes volume of a stock at a given day, for each stock in your data set. Then you can average all the factorials. So if the calculation result is 2 for example, you know that on average stocks tends to trade 2x of their first 5 minutes volume at their 30 minutes volume. So if a stock was trading 100k volume first 5 minute, you can predict it to trade 200k by the end of the first 30 minutes.

The second model I tried is Linear regression, which you should use when you find a linear correlation between two features. Let me show you how a linear correlation looks like. Let’s say we check if there’s a linear connection between the number of rooms in a house and the price of the house. We can sort the data according to number of rooms, and plot a line chart from the prices.

we can clearly see that more rooms mean higher prices, so there’s a linear connection (or a positive correlation) between number of rooms and the price of the house. One more test you can do is to check the correlation between the two features with excel analysis tool. A test on the above data points will look as follows:

It means that the correlation between number of rooms and price of the house is 0.92, a positive correlation. Note that as close we are to 1, the more the features are connected. The closer we are to 0, the less they connected.

After finding that the two features are correlated, we can run the linear regression test in excel data analysis tool to find the predicting formula:

So without getting too much into all the numbers, the formula we get is:

Price = (73,411.63 x Number of Rooms) – 41,504.31

that should give us the price based on the number of rooms.

For example, given a new house with 3 rooms, we would predict its price to be:

73211.63 x 3 – 41,504.31 = 178,130.58

Of course you can try and predict house prices (and volume) with more feature such as street, city and neighbor’s education, in that case you’ll have 4 coefficients.

So when examining my data, I found a clear linear correlation between all combinations of volume periods during the day, as can be seen below. This is an example of 5 to 30 minutes correlation, with 0.93 which is a high positive correlation.

To make things more complicated (and maybe more accurate), you can add features as float, market cap and any other feature you can think of, and check if there’s a correlation with future volume. If you get such correlation, just add it to the linear regression formula and get more accurate result. Note that when adding more features, you need more data samples. An explanation regarding how many data you need in connection with the number of features can be found here.

Although I was now able to predict future volume with relatively high precision, it didn’t give me much edge. Yes, I knew what the volume is probably going to be in the next 5, 10 or 60 minutes, but how can I trade off of it? How much volume is going to be traded doesn’t say much about the direction (other than EOD volume which holds a special edge I’ll let you figure out on your own ).

After digging in the data, and taking some advices from ADF, I played with the relations between the predicted volume and the actual volume as long as the day progresses. I tried every relation I could think of until I found a way to raise the 74% success rate I had with predicting direction of the day, to 80% and sometimes 88%. The best thing was that I could now also avoid the huge run ups we sometimes have in this small caps.

For example, if you predicted the first 30 minutes volume to be 8m but the actual volume was 6m – your ratio is 6 / 8 = 0.75. Now see if you can find a ratio that under it the odds of the stock to finish red are higher than it is in the general case. Similarly, look for a ratio that above it the stock tends to close green.

**Don’t have time to create your own model? get mine ****here**

Volume forecast is the real deal and can give you an edge. @team3dstocks is a genius for finding this thing, and a really good guy for sharing it. To get your own version of it, download a lot of data for the pattern you are interested in, use average/ linear regression to predict future volumes and look for ways of connecting the relations between forecasted and actual volumes to future stock behaviour.

]]>