Since I started posting about volume forecast, I got a lot of messages asking how one can do it. The less data driven traders have hard time figuring it out, so wanted to lay a road map towards achieving your own volume forecast indicator.

For those of you who never heard the term, volume forecast is a way to predict end of day volume. The statistical edge is not in the prediction itself, but in following the way that prediction changes during the day. Stocks that fill their prediction too quick tend to squeeze hard and give shorts hard time, and those that slow with fulfilling their volume prediction tends to die quick and hard.

So how can you make your own version of the volume forecast?

** TL;DR **

- Get as many instances as you can of past runners. A great no code option to get all past runners is spikeet, I use it all the time to get data.
- Choose a predictive model – linear regression / simple average.
- Calculate: (actual volume / predicted volume) for different parts of the day.
- Find correlation between the different ratios and end of day result (did the stock finish red or green?)

I first bumped into Volume Forecast when reading AllDayFaders (twitter: @team3dstocks), then listened to Stan Gluzman (twitter: @CiocanaTrader) explaining his approach regarding VF in Chat with Traders podcast.

Volume forecast is a way of predicting volume in order to predict future price of a stock. For example, trying to predict the end of day volume according to the first 30 minutes volume.

At first, I thought there’s a precise formula that these great traders have found, one that allows them to predict future volume. So I did some coding and downloaded volume data via Interactive Broker’s API. But when watching the results I couldn’t be more disappointed. I know how to approximately predict future volume, but what can I do with that? Later on, things have clicked and I found the edge I was looking for when predicting volume.

In the following post I will share just enough to guide you in building your own model and the advantages of having such model.

#### The data you need for the prediction

First you need data. I took mine from Interactive Brokers via their API. Manually collected data is also something you can do, but you’ll probably get less instances, which means the model will be less accurate. I used 1200 instances for my model.

Make sure you have the following features:

- first 1 min volume
- first 5 min volume
- first 15 min volume
- first 30 min volume
- first 60 min volume
- EOD volume
- PM volume

Of course you can play with more intervals if you like, but I used the above.

#### Different prediction models you can use

To try and predict future volume based on past volume, I tried 2 simple methods – averaging factorial and linear regression.

The first one is simply checking what is the factorial from one data point to another, by dividing the 2 data points. For example, you can divide first 30 minutes volume by first 5 minutes volume of a stock at a given day, for each stock in your data set. Then you can average all the factorials. So if the calculation result is 2 for example, you know that on average stocks tends to trade 2x of their first 5 minutes volume at their 30 minutes volume. So if a stock was trading 100k volume first 5 minute, you can predict it to trade 200k by the end of the first 30 minutes.

The second model I tried is Linear regression, which you should use when you find a linear correlation between two features. Let me show you how a linear correlation looks like. Let’s say we check if there’s a linear connection between the number of rooms in a house and the price of the house. We can sort the data according to number of rooms, and plot a line chart from the prices.

we can clearly see that more rooms mean higher prices, so there’s a linear connection (or a positive correlation) between number of rooms and the price of the house. One more test you can do is to check the correlation between the two features with excel analysis tool. A test on the above data points will look as follows:

It means that the correlation between number of rooms and price of the house is 0.92, a positive correlation. Note that as close we are to 1, the more the features are connected. The closer we are to 0, the less they connected.

After finding that the two features are correlated, we can run the linear regression test in excel data analysis tool to find the predicting formula:

So without getting too much into all the numbers, the formula we get is:

Price = (73,411.63 x Number of Rooms) – 41,504.31

that should give us the price based on the number of rooms.

For example, given a new house with 3 rooms, we would predict its price to be:

73211.63 x 3 – 41,504.31 = 178,130.58

Of course you can try and predict house prices (and volume) with more feature such as street, city and neighbor’s education, in that case you’ll have 4 coefficients.

So when examining my data, I found a clear linear correlation between all combinations of volume periods during the day, as can be seen below. This is an example of 5 to 30 minutes correlation, with 0.93 which is a high positive correlation.

To make things more complicated (and maybe more accurate), you can add features as float, market cap and any other feature you can think of, and check if there’s a correlation with future volume. If you get such correlation, just add it to the linear regression formula and get more accurate result. Note that when adding more features, you need more data samples. An explanation regarding how many data you need in connection with the number of features can be found here.

#### Now what?

Although I was now able to predict future volume with relatively high precision, it didn’t give me much edge. Yes, I knew what the volume is probably going to be in the next 5, 10 or 60 minutes, but how can I trade off of it? How much volume is going to be traded doesn’t say much about the direction (other than EOD volume which holds a special edge I’ll let you figure out on your own 😉).

After digging in the data, and taking some advices from ADF, I played with the relations between the predicted volume and the actual volume as long as the day progresses. I tried every relation I could think of until I found a way to raise the 74% success rate I had with predicting direction of the day, to 80% and sometimes 88%. The best thing was that I could now also avoid the huge run ups we sometimes have in this small caps.

For example, if you predicted the first 30 minutes volume to be 8m but the actual volume was 6m – your ratio is 6 / 8 = 0.75. Now see if you can find a ratio that under it the odds of the stock to finish red are higher than it is in the general case. Similarly, look for a ratio that above it the stock tends to close green.

#### Conclusion

Volume forecast is the real deal and can give you an edge. @team3dstocks is a genius for finding this thing, and a really good guy for sharing it. To get your own version of it, download a lot of data for the pattern you are interested in, use average/ linear regression to predict future volumes and look for ways of connecting the relations between forecasted and actual volumes to future stock behaviour.

## 6 comments

This is the real deal

Wow, very insightful. Thanks!

hi

thanks for explain

Where can I find them please?

average/ linear regression

Ratio of predicted volume to float size? I’m still trying to figure out how to create a ratio in regard to #vwapboulevard.

Hi Niv – I’m curious if you are still selling your vol forecast script? When I go to purchase through your page, there’s no request for email or where to send the script.

How accurate it is in terms off % ? In first 1h off traiding. I’m willing to buy it.