Wednesday, June 27, 2012

soccer, chance, and attribution, continued


As I argued in an earlier post, football is an inherently probabilistic game. Here I’d like to expand a bit on what I mean, and look at some (preliminary) evidence.

Clearly, everyone realizes that chance and luck play some role in soccer (indeed, in any sport). I’d like to argue that it plays a rather larger, and more specific role than we might think. In particular my hypothesis is that we can predict the distribution of goals in a soccer match and over a number of matches with a fixed-probability model. Imagine a soccer game is like a series of coin tosses of a very, very unfair coin. In each minute of a soccer match, we toss a coin that has about a 1/35 probability of landing on heads. How many times will it land on heads?

My hunch is that that the number of heads you get in this experiment is the same as the number of goals you get in any given soccer match, which (if true) means that in every minute of a soccer match there’s more or less a fixed probability that one or the other team will score.

This isn’t what we expect, or what conventional wisdom would predict. We like to think that in a 0-0 soccer game, the teams just didn’t attack very well, or defended very well, or both, and that in a 4-3 game, the opposite is true. Those teams really came out with attack-minded tactics and didn’t play defensively at all! And they did brilliantly, too! Right?

Well, if we can accurately predict the number of goals in a match with a weighted coin experiment, then clearly we’ve formed our opinion about the nature and flow of the match, somewhat falsely, based on the final score. We wouldn’t congratulate a coin tosser for getting 10 heads in 90 minutes with the weighted coin, would we?

Now I’m not trying to say that this is completely true, that soccer is purely a game of chance, nor does what I looked at (so far) say anything about the relative share of the goals for the two separate teams. Nevertheless, the results are suggestive of the “over-attribution” effect I pointed out in an earlier post: people think that teams who score goals played well and vice versa, even in the face of lots of conflicting evidence.

Now for some basic probability. (Skip if you’ve ever taken statistics, or are bored.) The number of heads in our little experiment (or goals scored in a real match) is of course is a fixed number. Toss a coin 90 times (once for each minute of the match), and get heads once, or twice, or zero times, or ten times. If you perform this experiment a billion times, you won’t get a single result, but a billion results that we call a distribution the number of heads: maybe 10 million zero’s, 20 million one’s, etc. The fraction of the time the experiment resulted in zero, one, or two heads , etc. is a good estimate of the probability that you would get that number of heads in any single experiment.

Luckily we don’t have to really do all that work tossing coins, ‘cause someone worked out the math for us! If the coin has a fixed probability of landing on heads on each toss, we can calculate the probability that we will get zero, one, two (etc.) heads in an entire match using a common probability distribution for rare events (the Poisson distribution).

We can test, then, whether a sample distribution of goals in soccer matches matches the Poisson distribution. Long story short, it does. Very well, in fact! As samples, I used each of the last six seasons of the English Premier League, individually and compiled into one sample (that's 2280 matches), and each of the last 3 World Cups and European Championships (collectively, for a representative sample). Each sample matches the Poisson distribution—the same one that we would get with a weighted coin, remember—extremely well!

There are other intriguing and unexpected results, however. For the English Premier League seasons, one fact jumps out from the data: the number of 1-0 games is massively over-predicted by the probability model, meaning that a first goal in a game actually makes a second goal much more likely. This result is true for each season individually and is also statistically significant.

As you might guess (and as I hypothesized), this is not true for international tournaments like the World Cup and European Championship, where 1 goal games are slightly but not significantly underpredicted.

The larger point remains, though, that soccer is a probabilistic game. I wish that it had more chances and more goals. An important effect of the probabilistic nature of soccer, which I’ll have to expand on another time, is that the randomness in goal-scoring also affects the randomness of the outcome, or result, of the match.

Now there’s a lot of other stuff to do with this statistical modeling. Here are some other ideas. First, is this effect unique to soccer? I would guess that hockey follows the same distribution, but baseball does not.

The next thing to look at, though, might be the effect that the first, second, or third (etc.) goal has on the result of the match. ESPN is fond of showing us idiotic statistics like the fact that the team that scores first has a great record in Euros, or World Cups, or whatever. Of course! In a game with so few goals, every goal matters. It would be much more interesting, in an ongoing 1-0 game, to show us the record of the team that scores second, since it is also extremely good. 1-1 game? Then show us the record for teams scoring third! My next question would be whether the first/second/third goal matters more than the simple probability model would predict. These are more interesting questions, I think….and also harder to gather and analyze data for, unfortunately…we'll see what I can do.

2 comments:

  1. it's part of soccer's strange appeal: that it has this large random element to it. Everyone "hates" it -- when the 2d best team on the pitch wins, for instance, which happens a lot in soccer -- but in reality we are, for some reason, drawn to it.
    So couldn't you use this model to predict how often the "best" team (the one with a higher probability of scoring during any minute) loses or draws?

    ReplyDelete
  2. Hello! What's your opinion on who is your blog's average reader?

    ReplyDelete