Adjusting team xGoals
/By Matthias Kullowatz (@mattyanselmo)
When we produced the game-by-game expected goals results last week, we were surprised to see that Seattle had outpaced Portland 4.0 to 1.7. That didn't feel right, but it didn't take long before we noticed that Seattle recorded five shots inside the six-yard box leading up to its first goal. Those shots added up to more than 2.0 expected goals, despite the fact that soccer's rules limit scoring to one goal at a time.
When analyzing expected goals at the team level, it seems that limiting such sequences of shots to a maximum of 1.0 xG would be more representative of what we want from the statistic--that is, the expected number of goals scored over the course of a game or a season. So going forward we'll use a probability adjustment to cap any sequence of shots at 1.0 xG, if they are part of the same possession. The mathyness of that adjustment is at the bottom for especially interested readers, and it has already been implemented in the new interactive tables.
But first, what is a possession? The easiest way to categorize possessions algorithmically is too look at the time gap between shots. We decided that any shot within five seconds of the previous shot would be counted as the same possession. This could lead to strings of shots being part of the same possession even if the first and last shots are more than five seconds apart. For instance, shots at times 12:05, 12:09, and 12:13 would all fall under the same possession because each is within five seconds of the previous shot (besides the first, of course).
Why five seconds? Because of the plot below. I tested every gap between 1 and 30 seconds to see how much the sum total expected goals changed. Expected goals changed a lot between 0 and 5 seconds, and then settled down after that. The five-second gap captures the bulk of those shot sequences that we want to trim down to more logical xG totals.
The probability method that we use was suggested at one point by Michael Caley (MC_of_A), though I could not find a specific citation. When multiple independent binary events occur, the probability of all of them happening is the product of their probabilities. Example: three shots are taken in sequence, each with a 50% chance to go in. The probability that a team would miss all of them is 0.5 x 0.5 x 0.5 = 0.125. Flip that probability around to 0.875, and those are the chances that at least one of those shots would score (ending the sequence in this hypothetical). So we'd cap the total expected goals for that sequence of shots at 0.875 by dividing by their original value (1.5) and multiplying by 0.875.