Since I started posting about volume forecast, I got a lot of messages asking how one can do it. The less data driven traders have hard time figuring it out, so wanted to lay a road map towards achieving your own volume forecast indicator.
For those of you who never heard the term, volume forecast is a way to predict end of day volume. According to some respected traders in Twitter, the statistical edge is not in the prediction itself, but in following the way that prediction changes during the day, in relation to the rate of change of the stock.
2022 update : I stopped using the indicator as for my specific strategies it didn’t give any extra edge. It can be different for you of course, but for me it didn’t make sense to use anymore. Some traders DM’d me that this indicator changed their lives. Probably each trader uses it differently.
So how can you make your own version of the volume forecast?
- Get as many instances as you can of past runners. A great no code option to get all past runners is spikeet, I use it all the time to get data.
- Choose a predictive model – linear regression / simple average.
- Calculate: (actual volume / predicted volume) for different parts of the day.
- Find correlation between the different ratios and end of day result (did the stock finish red or green?)
I first bumped into Volume Forecast when reading AllDayFaders (twitter: @team3dstocks), then listened to Stan Gluzman (twitter: @CiocanaTrader) explaining his approach regarding VF in Chat with Traders podcast.
Volume forecast is a way of predicting volume in order to predict future price of a stock. For example, trying to predict the end of day volume according to the first 30 minutes volume.
The data you need for the prediction
First you need data. I took mine from Interactive Brokers via their API. Manually collected data is also something you can do, but you’ll probably get less instances, which means the model will be less accurate. I used 1200 instances for my model.
Make sure you have the following features:
- first 1 min volume
- first 5 min volume
- first 15 min volume
- first 30 min volume
- first 60 min volume
- EOD volume
- PM volume
Of course you can play with more intervals if you like, but I used the above.
Different prediction models you can use
To try and predict future volume based on past volume, I tried 2 simple methods – averaging factorial and linear regression.
The first one is simply checking what is the factorial from one data point to another, by dividing the 2 data points. For example, you can divide first 30 minutes volume by first 5 minutes volume of a stock at a given day, for each stock in your data set. Then you can average all the factorials. So if the calculation result is 2 for example, you know that on average stocks tends to trade 2x of their first 5 minutes volume at their 30 minutes volume. So if a stock was trading 100k volume first 5 minute, you can predict it to trade 200k by the end of the first 30 minutes.
The second model I tried is Linear regression, which you should use when you find a linear correlation between two features. Let me show you how a linear correlation looks like. Let’s say we check if there’s a linear connection between the number of rooms in a house and the price of the house. We can sort the data according to number of rooms, and plot a line chart from the prices.
we can clearly see that more rooms mean higher prices, so there’s a linear connection (or a positive correlation) between number of rooms and the price of the house. One more test you can do is to check the correlation between the two features with excel analysis tool. A test on the above data points will look as follows:
It means that the correlation between number of rooms and price of the house is 0.92, a positive correlation. Note that as close we are to 1, the more the features are connected. The closer we are to 0, the less they connected.
After finding that the two features are correlated, we can run the linear regression test in excel data analysis tool to find the predicting formula:
So without getting too much into all the numbers, the formula we get is:
Price = (73,411.63 x Number of Rooms) – 41,504.31
that should give us the price based on the number of rooms.
For example, given a new house with 3 rooms, we would predict its price to be:
73211.63 x 3 – 41,504.31 = 178,130.58
Of course you can try and predict house prices (and volume) with more feature such as street, city and neighbor’s education, in that case you’ll have 4 coefficients.
So when examining my data, I found a clear linear correlation between all combinations of volume periods during the day, as can be seen below. This is an example of 5 to 30 minutes correlation, with 0.93 which is a high positive correlation.
To make things more complicated (and maybe more accurate), you can add features as float, market cap and any other feature you can think of, and check if there’s a correlation with future volume. If you get such correlation, just add it to the linear regression formula and get more accurate result. Note that when adding more features, you need more data samples. An explanation regarding how many data you need in connection with the number of features can be found here.