Using Machine Learning to predict football matches
To describe the process briefly, I started by collecting as much data as I could get hold of. I mined data about old games from every different source and API I could find. Some of the more important sources were Football-data, Everysport and Betfair. Taking all the data from old matches quantifying it and putting it in a database. Finally I used the data to train a Machine Learning model, using it to predict upcoming games.
How to measure the Machine Learning model’s performance
Now, the nature of a football game is of course that it is unpredictable. I guess that is why we love the game. But I was still obsessed of using a Machine Learning approach to predict games better than my own mind. I knew that like most humans do I would base predictions on emotions rather than facts. Therefore I maybe somewhat biased. I know for sure that I quite often have placed bets in the past based on a “gut feeling”.
The first question I had to answer was how to measure if my Machine Learning model was successful or not. I quickly came to realise that measuring the actual percentage of correctly guessed games didn’t say that much if I didn’t put it in relation to something else. And the best thing I could come up with to relate the model to was what other people were thinking. The easiest way to assess that would be to look at market regulated odds. So I started comparing how my model would perform if betting on Betfair because their odds are regulated on people betting against each other. This would make the odds a reflection of what the “market” predicts.
So now, two years has passed. Has the model made me rich? No, not at all. Quite soon I realised that the predictions my model made for most part was aligned with the market. Since I use a regression based model I’m able to predict the strength of the probability of a certain outcome of a game. And at the strongest grades of probability my model gives, it predicts roughly 70% of the games correctly. Now the problem is that the market more or less performs just as well making it hard to actually make money out of my model. But, to be honest, I never really thought that I would create a money machine. Instead I have come to several insights about the possibilities and limitations of Big Data and Machine Learning.
How much does a Machine Learning model learn over time?
One of the first things I started looking at was could the outcome of the predictions improve over time as the amount of data the model has to learn grows. This is something I haven’t seen at all. Two years ago I started with having about 2,000 games in my database with quite a limited data-set attached to them. Now I have almost 30,000 games. Each game comprises of data from weather and distances between the teams, home grounds, to shots and corners. Given all this data, and the fact that the model has been able to “learn” over time, still hasn’t improved the predictions. This has taught me that Machine learning only takes you so far in trying to predict the unpredictable.
I also learned that the power of Machine Learning in many ways lies in its power to make unbiased generalisations. Over the past two years, I was very curious to see if my model could predict when winning or losing streaks were to be broken. If for instance, could it predict when Barcelona would finally lose after winning 10 straight games? If the model could find small signs that would indicate some kind of anomaly. Well, it has shown to not be that good at that.
What I found instead was that it was really good at over time, bets against over valued teams. For instance, last season I saw how my model quite often predicted results against Borussia Dortmund while the market made a different prediction. Dortmund ended up having a bad season, making my model really successful here in relation to the market. This season I have seen the same when it comes to teams like Liverpool and Chelsea. So the lesson learned is that some people tend to make decisions based on emotions. Liverpool and Dortmund are teams liked by lots of people and at times you make predictions with your heart instead of your brain. My Machine Learning model does not.
Last but not least I guess, I learned that making better predictions than the market is hard. Still, when I started looking at what I actually had achieved I realised some quite amazing things. From a simple Python program, and less than 10,000 rows of code, I still had made something that performed just as well as the market. My model was also able to pick out interesting bets on a weekly basis, just as any newspaper or expert does. The difference being I didn’t have to use a lot of manual labour to achieve the same results. So the main insight is that by making generalisations you might not be able to find the one bet that makes you rich, but it may save lots of time when placed in the correct context.
Applying Machine Learning to Business Innovation
With these insights I started to look at another project I’ve been involved in for the last 5 years. The innovation platform Wide Ideas. What I wanted to do was to apply some Artificial Intelligence by looking at the ideas gathered from a company’s employees and trying to predict whether the idea would be implemented or not.
We started by looking at the ideas just as if it was a football match. We quantified the data, but instead of shots and weather we looked at how many had interacted with an idea and in what way. Due to discretion I won’t go into details, but the outcome was quite similar to the football model previously described. We now can make a quite good prediction on whether an idea will be implemented or not given the data the idea contains. This is a way of generalising the ideas, answering the question and understanding the factors behind a good idea. Artificial Intelligence can therefore play a key role in business innovation.
However – can we find a good idea that doesn’t follow the general patterns of a successful idea? No, not really – not yet at least. Still, for the product, and given that you look at an organisation that creates say 10,000 ideas per year, finding any good idea is really hard and time consuming. So just by reducing down from 10,000 ideas to 100 probable good ones, and visualising the results, can save an incredible amount of time. And this is where Artificial Intelligence with a Machine Learning model has given us the most gain.
Predicting the unpredictable
We see companies gathering lots of data promising that they might be able to predict anything from finding cancer to making self driving cars. And they might. Especially where generalisation saves time. The medical implementations is a good example of this. Looking at pictures of birthmarks, a Machine Learning model can pick the most likely ones to be cancerous from a large set of pictures saving doctors time and money.
But a lot of things companies may try to predict has an unpredictable nature. Human behaviour is one. How far can we come in predicting the human behaviour if it is essentially unpredictable? We will be able to generalise, placing people into different categories based on what you like to eat, watch or do, but honestly who likes to be generalised?
What the past two years has taught me is that we in some way right now may be seeing a Big-Data-bubble. Will Big Data really find the anomalies or will it just be really good at making generalisations?
Many of the promises made by companies tend to be that they will be able to find the needle in the haystack. However, most of these results are based on generalisations. They may do this because their value right now is often based on the amount of data they possess and not what they do with it. And if they were honest with the fact that they make generalisations, good ones but still generalisations, their value would decrease. I hope that we can see a future where company values are based on what they do with the data rather than how much data they have. This will require transparency and honesty, just as I’ve been with my football model.
So, until someone proves me wrong I’m not convinced in the power of Big Data in general. I only believe in it where the cases are clear and one of the most obvious and best ones are examples within healthcare. The risk otherwise is that you end up with so much data that the sheer amount suffocates every possibility to make sense out of it in any other way than vast generalisations.
Ola Lidmark Eriksson