Building a System for Assessing Player Value
/By Dave Laidig (@davelaidig)
For years I’ve been interested in how players contribute to team results. I’ve sought a measure of player contributions to a win that covered all aspects of a game. While many valuable and informative soccer metrics have been created, common stats are not entirely on point with this issue.
For example, xG stats apply only to scoring attempts, and perhaps goalkeepers. Adding xAssists and key passes broadens the scope of included players. But the contribution of defensive oriented players would not be expected to show up on these metrics. And offensive-oriented players would still rely on teammates to threaten the net before their effort can be measured.
The xGChain metric is useful for identifying players that participate in the most productive attacks, and includes players that play further away from the goal. But this metric does not include non-offensive actions. And each players’ contribution is given equal weight, whether it’s the initial square pass to a CB in the defensive half, or delivering a cross into the penalty area. Experienced analysts consider the dashboard of key performance indicators and piece together insights from the elements. But I’m looking to consolidate all game elements with a common perspective.
My goal is not new, nor necessarily unique. There have been many attempts to create a comprehensive performance index; from Sarah Rudd using Markov Chains, Dan Altman’s Shapley Values, Goalimpact ratings to corporate sponsored efforts like the Castrol Index and the MLS-partnered Audi Index. Recently, Nils MacKay has advanced his own model that also evaluates the xG added by game actions via a different approach. Even ASA contributor Mark Goodman has tried his own ranking system. In addition, I understand many teams have their own version of a performance index. Unfortunately, these metrics were created at private expense and are proprietary, which makes it difficult to evaluate the data and their utility (to those without subscriptions at least).
As a result, I set about to create a metric that assigns each player their contribution to the team’s result. The fundamental calculation is the difference between the chances of scoring when a player gets the ball, and the chances of scoring when a player is done. To facilitate this comparison, I used the 2017 season as a basis for determining the chances of scoring (at the end of the possession) from any area on the field.
Average xG per Possession
Using the American Soccer Analysis data set covering all 374 matches of the 2017 MLS season, I placed all shots, dribbles, passes, and defensive actions into chronological order, and applied my possession definition.
Possession:
- Starts with Shot, dribble, completed pass, or incomplete free kick, corner, throw-in
- Ends with a shot, opponent offensive action (pass/dribble/shot), or end of half/game
The result was over 65,000 possessions in the 2017 MLS season. I broke the pitch into over 100 zones, and tracked which zones show up in each possession, and the possession result. This data provides the average expected possession result (in xG) for each zone. For more detail, an earlier version of the chances of scoring from different zones can be found here.
For this analysis, I improved the earlier results by separating free kicks, corner kicks and penalties from the regular run of play touches. I removed the possession start condition that relied on a defensive action; because it typically was an immediate turnover and did not reflect a real “possession” in my opinion. And I also added every zone where a completed pass was received; and then removed duplicates so that each zone I could possibly capture was represented only once in a possession chain. This I felt was a truer indication of the average xG per possession (when possessing the ball in any particular zone).
Features of Average xG per Possession field grid:
- The average xG per result data are intuitive and based simply on observations,
- Possession results use xG instead of goals scored which increases the number of non-zero observations and reduces some of the randomness,
- Values are based on the entire possession chain, and is not limited to the last couple touches or an arbitrary period of time before a shot (note: some possessions can exceed 30 passes and over a minute of game time),
- Possession during the run of play is separated from free kicks, corner kicks, and penalties,
- Areas where passes are received are also included in possession chain calibration,
- Each zone only counted once per possession, and
- Zones approximate field markings and are smaller in the final third as small changes in location start to be more meaningful.
Overall Player Contribution Rating
Again, we look to the ASA data to examine all possession chains in the 2017 and 2018 MLS seasons. Knowing the value of the various pitch locations (in terms of the average xG result from the possession, also shortened to “zone value”) means we can evaluate each touch based on the difference between the start value and the end value.
The start value is the value for the player’s first recorded zone for his touch. And the end value can either be (1) the zone value of a completed pass, (2) zero for a turnover, (3) the shot xG, or (4) the probability of scoring a shot on target. A 100% probability of scoring a goal is the same as 1.0 xG, and lesser probabilities of scoring equal a proportionate equivalent of xG. Thus, player value is measured in xG equivalents (also called non-shot xG).
While the details of how this method applies to various scenarios will be discussed in greater detail in the subcategory discussions; there are a few noteworthy aspects of the overall player value rating to highlight at the start.
Overall Player Value
- Represents value added by player in terms of added/decreased xG expected
- Includes GK actions for opponent shots on target (see GK Value below)
- Includes red card penalty (see F-Up Value below)
- Includes assessment for PK won (+0.20) and PK conceded (-0.55) (see F-Up Value below)
- Includes losses of possession not otherwise captured in game data (see TO/LOP Value below)
- Does not value off the ball plays
- Does not value incomplete passes where team keeps possession (see Pass Value below)
- Does not value incomplete passes where team never had possession (e.g., clearance)
- Does not value defense actions that do not immediately lead to own team’s possession (see Defense – Turnover Value below)
In 2017, the average player contribution for 90 minutes is 0.107 (xG equivalents). There were 169 players with above average values and at least 1000 minutes in 2017; there were 166 players below average. The highest 30% were at 0.14 xG per game and higher. And there was a middle 40% in between 0.14 and 0.07. The bottom 30% were at 0.07 xG per game and lower.
Validity as a Performance Measurement
If getting a higher score is “good,” then we should see higher scores reflect “good” results. Otherwise, you’re not measuring what you think you’re measuring (in technical terms, the measure is not “valid”). Fortunately, we do see the player ratings reflect actual success.
We can start with the purpose of this measure, breaking down each player’s contribution to the team winning a game. And the team with the higher overall rating was more likely to actually win the game. The difference between Team A’s value and Team B’s value after a game is highly correlated with the actual goal differential. For the 2017 MLS season, the correlation was 0.90, and so far in 2018 308 games), the correlation is 0.85. For comparisons, the xGD and actual goal difference correlation is 0.44 in 2017 and 0.50 for 2018.
And I’m not saying the player value ratings are a better stat than xG stats per se, especially since xG metrics have demonstrated utility for all sorts of applications. I’m only reporting that adding additional information via player ratings gets closer to mirroring actual results, which makes intuitive sense. And if we want to look at the Audi Index, a statistic with similar goals as the player value rating, the match level correlation between Team A minus Team B index results and the actual goal difference was 0.71 in 2016. In sum, the player value rating appears to meet its goal of reflecting team results at the game level, and reflects a stronger relationship than the Audi Index.
Although breaking out the players’ contributions to individual game results is the primary goal, we can also examine how the player ratings reflect other important indicators of success. Turning to the season table, the correlation between a team’s season total and their points per game is 0.76 for 2017 and 0.70 for 2018. In contrast, the Audi Index was correlated to season points at 0.44 in 2016. Further, other known performance effects show up. Home field advantage is reflected as well; the home team rating averages 1.51 xG equivalents, and the away team 0.69 xG equivalents. Across perspectives, better teams seem to have higher value ratings.
And for better or worse, one of the most persuasive measures of validity is whether a metric produces generally expected results. In essence, how do the good players rate?
As of September 3rd, the top 25 season contributions by total value and per 90 minutes.
Position | Minutes | 2018 Season Value | |
---|---|---|---|
Position | Minutes | 2018 Value (p90min) | |
---|---|---|---|
In 2017, the top 25 season contributions in MLS by total value and per 90 minutes:
Minutes | 2017 Season Value | |
---|---|---|
2417 | 11.35 | |
2935 | 10.04 | |
3069 | 9.87 | |
2555 | 9.52 | |
2523 | 8.90 | |
3231 | 8.48 | |
2574 | 8.40 | |
2723 | 8.14 | |
2250 | 8.09 | |
2684 | 7.93 | |
2742 | 7.77 | |
3106 | 7.57 | |
2656 | 7.28 | |
2200 | 6.95 | |
2827 | 6.89 | |
2244 | 6.72 | |
2473 | 6.71 | |
2162 | 6.61 | |
2685 | 6.53 | |
2872 | 6.52 | |
3004 | 6.16 | |
3002 | 5.89 | |
1737 | 5.75 | |
2407 | 5.64 | |
2529 | 5.60 |