Tiny Differences: How Changing Small Things Can Have Big Consequences
/By Cheuk Hei Ho (@Tacticsplatform)
Short passes dominate every soccer game. They are the most abundant on-the-ball action. But the variation in short pass accuracy is small; the difference in short pass success rates between the best and the worst team in MLS is 13%. For a typical game with about 400 short passes, the difference represents 52 more successful attempts, or one extra pass every two minutes. How much impact can these extra passes have?
Atlanta United is especially dependent on short passes that lead to shots. What would a few more short passes mean for their offense? Yankee Stadium is a tough place for any visiting team. Critics say that it is too small, and only New York City FC play well there. How exactly do they take advantage of the home turf?
The best way to approach these questions other than watching thousands of clips is to make a model with data and use it to examine or even predict what a team excels or suffers. There isn’t one... yet. Can we make one?
Incremental Success Makes a Big Impact
Atlanta’s short pass accuracy has a significant impact on their offensive performance; the correlation coefficient is close to 0.71 between their short pass success rate and the number of shots they create without the contribution from a corner. A correlation coefficient (or Pearson correlation coefficient, R, to be precise) measures the strength and the direction of two variables (in our case, the short pass success rate and the number of shots created). It ranges from -1 to +1. +1 means that both variables have a perfect positive correlation (one value goes up, and another value goes up). -1 means that both variables have a perfect negative correlation (one value goes up, and another value goes down). Zero means that there isn’t a correlation between two variables. Bear in mind that even a perfect correlation coefficient doesn’t imply a cause-and-effect relationship between two variables, or even if there is one, the direction of it isn’t clear. I am making an assumption that the short pass success rate affects the number of shots created because a shot is the end result of any possession. After the shot, a possession changes hand – unless it results in the corner, but we don’t consider any corner in our analysis – so most passes should precede most shots (a pass can occasionally happen after a blocked shot).
With such a high correlation between them, we can use linear regression to describe the linear relationship between the short pass success rate and the number of shots created:
Number of shots = 107.4795 x (Short pass success rate) – 78.05535
The linear regression model is predictive for Atlanta because of its high correlation between the short pass accuracy and the chance creation. It doesn’t work as well for other teams because all of them have a correlation coefficient less than 0.6:
This model has an interesting implication; considering Atlanta United’s average short pass success is 84%, an increase of as little as 2% will help them to generate 2.4 more shots. How can such a tiny difference create so many extra shots? Because Atlanta United play about 422 short passes per game, a 2% increase means that they will make nine extra short passes, or one more short pass every five minutes (assuming they hold 50% of the possession). How can this small change impact the shot creation at all?
The answer lies in the way we approach the data. A team doesn’t try to make 400 individual passes. It is trying to string together several consecutive passes to advance the ball. We should not consider any pass as an individual action. They are part of a group of actions – the possession or a pass sequence/chain – that allows a team to reach a shooting position. Any failure in the individual action of the possession will lead to a breakdown. For example, Atlanta United averages 3.4 short passes per possession, meaning that for them to complete the possession, all those passes need to succeed together. 84% short pass success rate become 84%^3.4 = 55% possession completion rate if we ignore long passes.
We can design a model to test how these changes of pass success or possession completion affect the generation of shots. Consider this simple model with three components:
Number of shots = Number of possessions x Possession completion rate x Possession-to-shot generation rate
I am making three assumptions: 1) The successful outcome of any possession is that the ball reaches the shooting position. 2) The possession consists of only passes. 3) A team attempts to create a shot from every successful possession.
We know that on average, Atlanta United have 130 possessions per game, average 3.4 short passes and 0.1 long passes per possession, complete 55% of the long passes, and convert 22% of the possessions ending at the final third into shots. If we only manipulate the short pass success rate, our model becomes:
Number of shots = 130 (possessions per game) x 0.22 (final third possessions ending in shots) x (Long pass success rate)^(Number of long passes) x (Short pass success rate)^(Number of short passes)
Now let’s examine how many more shots they can generate with the short pass success rate increases from 84% to 86%:
Number of shots = 130 x 0.22 x 0.55^0.1 x [(0.86)^3.4-(0.84)^3.4] = 1.24 shots
According to our model, Atlanta United will make 1.2 more shots when their short pass success rate increases by 2% from their average.
To compare this result with the result from the linear regression model (2.4 shots), we can explain about 50% of the effect the increased short pass accuracy's contribution to Atlanta’s offensive performance. In a way, we are breaking down half of Atlanta’s offensive game into six variables (number of short passes, short pass success rate, number of long passes, long pass success rate, number of possessions, and the possession-to-shot conversion rate).
The remaining 50% of the effect can be explained by a myriad of factors: for example, the possession in our model only consists of the short passes. Making more passes may help one team to create more dribbling opportunities.
Atlanta’ high correlation between the short pass accuracy and chance creation may reflect their reliance on the short pass to not just advance the ball but to penetrate the defense. But the same linear regression model, with the short pass success as a critical variable, won’t work so well for other teams. They may rely on other actions, such as the dribble or the transition, to advance the ball. I assume that all the short passes are homogeneous when I measure their success rate, meaning a pass in the initial third is the same as the one in the middle third. That assumption also doesn’t apply to a lot of passes. But the most important point of the model is that it explains how incremental changes in a team’s behavior makes a significant impact on its shot creation. Actions need to be grouped into the possessions, and small individual differences add up to impact the outcome of the possession.
Tiny Field Makes a Big Difference
From the start of a possession, we can summarize the offensive phase into a model:
Number of goals = Number of possessions x Possession completion rate x Possession-to-shot generation rate x Shot-to-goal conversion rate
The model helps us to break down the offensive phase into multiple components we can individually measure. Two teams that score the same number of goals can do so with different styles: one team may amass a considerable amount of possessions but convert them into shots with low efficiency while the other may not have as many possessions but turn a vast amount of them into goals. We can isolate what a critical weakness or strength for any team is.
Let’s take one example to see how such a model can be useful: New York City FC. The tiny Yankee Stadium not only affects how a visiting team like Atlanta plays its offensive game, but it also boasts New York City’s attack performance; New York City FC create 1.1 more xG at home (2.3 xG and 1.2 xG at home and away, respectively). The xG difference is the second highest in MLS. With a similar xG/shot ratio (0.126 at home vs. 0.11 away), New York City raises its firepower by 6.6 more shots at home – a 67% increase from what they create in other stadiums without considering the corner – the highest in MLS.
With a smaller field compared to other stadiums, New York City’s players play more aggressive defense at home than away. Charles Boehm has written that that Yankee Stadium’s small size made the game frenetic and chaotic. He is right: New York City FC create 1.4 more tackles per 100 opponent’s passes at home, the 2nd most substantial increase between home and away game in MLS. But the increased defensive intensity doesn't help them create better chances at home; the correlation coefficient between their tackling intensity and xG is only 0.15, meaning that neither variable influence each other (or they do with a weak influence). Increasing defensive pressure cannot explain how New York City increases its offensive power at home.
So what factors are increasing New York City’s shot creation in the tiny Yankee Stadium? Let go back to our shot creation model:
Number of shots = Number of possessions x Possession completion rate x Possession-to-shot generation rate
If we can identify the critical change(s) that affect the above variables at home and away, we may be able to rationalize how the small field size impact it/them. In the table below I determine all three variables when they play at home or away.
Home | Away | Difference (Home minus Away) | % change (Difference divided by Away) | |
---|---|---|---|---|
151 | 137 | 14 | 10.2 | |
0.34 | 0.24 | 0.10 | 41.7 | |
0.30 | 0.29 | 0.01 | 3.4 | |
16 | 9.4 | 6.6 | 70.2 |