As I argued in an earlier post, football is an inherently
probabilistic game. Here I’d like to expand a bit on what I mean, and look at
some (preliminary) evidence.
Clearly, everyone realizes that chance and luck play some role in soccer (indeed, in any sport).
I’d like to argue that it plays a rather larger, and more specific role than we
might think. In particular my hypothesis is that we can predict the
distribution of goals in a soccer match and over a number of matches with a
fixed-probability model. Imagine a soccer game is like a series of coin tosses
of a very, very unfair coin. In each minute of a soccer match, we toss a coin
that has about a 1/35 probability of landing on heads. How many times will it
land on heads?
My hunch is that that the number of heads you get in this
experiment is the same as the number of goals you get in any given soccer
match, which (if true) means that in every minute
of a soccer match there’s more or less a fixed probability that one or the
other team will score.
This isn’t what we expect, or what
conventional wisdom would predict. We like to think that in a 0-0 soccer game,
the teams just didn’t attack very well, or defended very well, or both, and
that in a 4-3 game, the opposite is true. Those teams really came out with
attack-minded tactics and didn’t play defensively at all! And they did
brilliantly, too! Right?
Well, if we can accurately predict the number of goals in a
match with a weighted coin experiment, then clearly we’ve formed our opinion about the
nature and flow of the match, somewhat falsely, based on the final score. We
wouldn’t congratulate a coin tosser for getting 10 heads in 90 minutes with the
weighted coin, would we?
Now I’m not trying to say that this is completely true, that
soccer is purely a game of chance, nor does what I looked at (so far) say
anything about the relative share of the goals for the two separate teams.
Nevertheless, the results are suggestive of the “over-attribution” effect I
pointed out in an earlier post: people think that teams who score goals played
well and vice versa, even in the face of lots of conflicting evidence.
Now for some basic probability. (Skip if you’ve ever taken
statistics, or are bored.) The number of heads in our little experiment (or goals scored in a real match) is of course is a fixed number. Toss a coin 90 times (once for each
minute of the match), and get heads once, or twice, or zero times, or ten times. If you
perform this experiment a billion times, you won’t get a single result, but a
billion results that we call a distribution
the number of heads: maybe 10 million zero’s, 20 million one’s, etc. The
fraction of the time the experiment resulted in zero, one, or two heads , etc.
is a good estimate of the probability that you would get that number of heads in any single experiment.
Luckily we don’t have to really do all that work tossing coins, ‘cause
someone worked out the math for us! If the coin has a fixed probability of
landing on heads on each toss, we can calculate the probability that we will
get zero, one, two (etc.) heads in an entire match using a common probability
distribution for rare events (the Poisson distribution).
We can test, then, whether a sample distribution of goals in
soccer matches matches the Poisson distribution. Long story short, it does.
Very well, in fact! As samples, I used each of the last six seasons of the
English Premier League, individually and compiled into one sample (that's 2280 matches), and each of
the last 3 World Cups and European Championships (collectively, for a representative sample). Each sample matches the Poisson distribution—the same one that we
would get with a weighted coin, remember—extremely well!
There are other intriguing and unexpected results, however.
For the English Premier League seasons, one fact jumps out from the data: the
number of 1-0 games is massively over-predicted
by the probability model, meaning that a first goal in a game actually makes a second
goal much more likely. This result is
true for each season individually and is also statistically significant.
As you might guess (and as I hypothesized), this is not true for international tournaments
like the World Cup and European Championship, where 1 goal games are slightly
but not significantly underpredicted.
The larger point remains, though, that soccer is a
probabilistic game. I wish that it
had more chances and more goals. An important effect of the probabilistic
nature of soccer, which I’ll have to expand on another time, is that the
randomness in goal-scoring also affects the randomness of the outcome, or
result, of the match.
Now there’s a lot of other stuff to do with this statistical
modeling. Here are some other ideas. First, is this effect unique to soccer? I
would guess that hockey follows the same distribution, but baseball does not.
The next thing to look at, though, might be the effect that
the first, second, or third (etc.) goal has on the result of the match. ESPN is fond of showing
us idiotic statistics like the fact that the team that scores first has a great
record in Euros, or World Cups, or whatever. Of course! In a game with so few
goals, every goal matters. It would be much more interesting, in an ongoing 1-0 game, to show us the record of the team that scores second, since it is also extremely good. 1-1 game? Then show us the record for teams scoring third! My next question would be whether the first/second/third goal matters more than the simple probability model
would predict. These are more interesting
questions, I think….and also harder to gather and analyze data for, unfortunately…we'll see what I can do.
it's part of soccer's strange appeal: that it has this large random element to it. Everyone "hates" it -- when the 2d best team on the pitch wins, for instance, which happens a lot in soccer -- but in reality we are, for some reason, drawn to it.
ReplyDeleteSo couldn't you use this model to predict how often the "best" team (the one with a higher probability of scoring during any minute) loses or draws?
Hello! What's your opinion on who is your blog's average reader?
ReplyDelete