Calculating Expected Goals 2.0

I wrote a post similar to this a while back, outlining the process for calculating our first version of Expected Goals. This is going to be harder. Get out your TI-89 calculators, please. (Or you can just used my Expected Goals Cheatsheet). Expected Goals is founded on the idea that each shot had a certain probability of going in based on some important details about that shot. If we add up all the probabilities of a team's shots, that gives us its Expected Goals. Our goal is that this metric conveys the quality of opportunities a team earns for itself. For shooters and goal keepers, the details about the shot change a little bit, so pay attention.

The formulas are all based on a logistic regression, which allows us to sort out the influence of each shot's many details all at once. The formula changes slightly each week because we base the regression on all the data we have, including each week's new data, but it won't change by much.

Expected Goals for a Team

  • Start with -0.19
  • Subtract 0.95 if the shot was headed (0.0 if it was kicked or othered).
  • Subtract 0.74 if the shot was taken from a corner kick (by Opta definition)
  • Subtract one of the following amounts for the shot's location:
    Zone 1 - 0.0 Zone 2 - 0.93 Zone 3 - 2.37 Zone 4 - 2.68 Zone 5 - 3.55 Zone 6 - 3.06

Now you have what are called log odds of that shot going in. To find the odds of that shot going in, put the log odds in an exponent over the number "e". 

Finally, to find the estimated probability of that shot going in, take the odds and divide by 1 + odds.

Example: Shot from zone 3, header, taken off a corner kick:

-0.19 - 0.95 - 0.74 - 2.37 = -4.25

e^(-4.25) = .0143

.0143 / (1 + .0143) = 0.014 or a 1.4% chance of going in.

A team that took one of these shots would earn 0.014 expected goals.

Expected Goals for Shooter

  • Start with -0.28
  • Subtract 0.83 if the shot was headed (0.0 if it was kicked or othered).
  • Subtract 0.65 if the shot was taken from a corner kick (by Opta definition).
  • Add 2.54 if the shot was as a penalty kick.
  • Add 0.71 if the shot was taken on a fastbreak (by Opta definition).
  • Add 0.16 if the shot was taken from a set piece (by Opta definition).
  • Subtract one of the following amounts for the shot's location:
  1. 0.0
  2. 1.06
  3. 2.32
  4. 2.61
  5. 3.48
  6. 2.99

Now you have what are called log odds of that shot going in. To find the odds of that shot going in, put the log odds in an exponent over the number "e". 

Finally, to find the estimated probability of that shot going in, take the odds and divide by 1 + odds

Example: A penalty kick

-0.28 + 2.54 - 1.06 = 1.2
e^(1.2) = 3.320
3.320/ (1 + 3.320) = 0.769 or a 76.9% chance of going in.
A player that took a penalty would gain an additional 0.769 Expected Goals. If he missed, then he be underperforming his Expected Goals by 0.769.

Expected Goals for Goalkeeper

*These are calculated only from shots on target.

  • Start with 1.61
  • Subtract 0.72 if the shot was headed (0.0 if it was kicked or othered).
  • Add 1.58 if the shot was as a penalty kick.
  • Add 0.42 if the shot was taken from a set piece (by Opta definition).
  • Subtract one of the following amounts for the shot's location:
  1. One) 0.0
  2. Two) 1.10
  3. Three) 2.57
  4. Four) 2.58
  5. Five) 3.33
  6. Six) 3.21
  • Subtract 1.37 if the shot was taken toward the middle third of the goal (horizontally).
  • Subtract 0.29 if the shot was taken at the lower half of the goal (vertically).
  • Add 0.35 if the was taken outside the width of the six-yard box and was directed toward the far post.

Now you have what are called log odds of that shot going in. To find the odds of that shot going in, put the log odds in an exponent over the number "e". 

Finally, to find the estimated probability of that shot going in, take the odds and divide by 1 + odds

Example: Shot from zone 2, kicked toward lower corner, from the run of play.

1.61 - 1.10 - 0.29 = 0.22 e^(0.22) = 1.246 1.246/ (1 + 1.246) = 0.555 or a 55.5% chance of going in. A keeper that took on one of these shots would gain an additional 0.555 Expected Goals against. If he saved it, then he would be outperforming his Expected Goals by 0.555.

Frequently Asked Questions

1. Why a regression  model? Why not just subset each shot in a pivot table by its type across all variables?
I think a lot of information--degrees of freedom we call it--would be lost if I were to partition each shot into a specific type by location, pattern of play, body part, and for keepers, placement. The regression gets more information about, say, headed shots in general, rather than "headed shots from zone 2 off corner kicks," of which there are far fewer data points.
2. Why don't you include info about penalty kicks in the team model?
Penalty kicks are not earned in a stable manner. Teams that get lots of PK's early in the season are no more likely to get additional PK's later in the season. Since we want this metric to be predictive at the team level, including penalty kicks would cloud that prediction for teams that have received an extreme number of PK's thus far.
3. The formula looks quite a bit different for shooters versus for keepers. How is that possible since one is just taking a shot on the other?
There are a few reasons for this. The first is that the regression model for keepers is based only on shots on target. It is meant only to assess their ability to produce quality saves. A different data set leads to different regression results. Also, we are now accounting for the shooter's placement. It is very possible that corner kicks are finished less often than shots from other patterns of play because they are harder to place. By including shot placement information in the keeper model, the information about whether the shot came off a corner is now no longer needed for assessing the keeper's ability.
4. Why don't you include placement for shooters, then?
We wish to assess a shooter's ability to create goals beyond what's expected. Part of that skill is placement. When a shooter has recorded more goals than his expected goals, it indicates a player that is outperforming his expectation. It could be because he places well, or that he is deceptive, or he is good at getting opportunities that are better than what the model thinks. In any case, we want the expected goals to reflect the opportunities earned, and thus the actual goals should help us to measure finishing ability to some extent.