Goals Added (g+) MLS Analysis: Age Impact
/By Zach Beery
American Soccer Analysis (ASA) has built an expected possession value metric called goals added (g+). You can see a more in-depth explanation of this model here . The g+ model measures every action and how it impacts their team’s chance of scoring and conceding across the next two possessions. Compared to looking at traditional stats (goals, assists, tackles, etc.), g+ is more representative of a players’ true impact on their team’s results over the course of a season.
This article will focus on exploring the relationship between age and g+. This analysis could eventually lead to building an age curve model that controls for position, league, professional minutes, and historical performance amongst other various factors. In order to isolate the impact that age has on g+, decision-makers at soccer clubs should think critically about the age profile of the players they are targeting during the transfer window. For example, teams should probably be reluctant to give players past their prime a 3+ year contract at a high wage due to the likelihood that the player’s future production will not justify their future wages.
Compared to other sports, soccer may prove tougher to produce a predictive age curve due to the plethora of leagues each with unique levels of skill. This is even harder in MLS due to the notoriously complicated and confusing roster rules. The most impactful of these rules is the Designated Player (DP) rule. The DP rule allows clubs to acquire up to three players whose total compensation and acquisition costs exceed the Maximum Salary Budget Charge. The DP rule was created in order to bring stars like Beckham, Henry, Pirlo, or Ibrahimovic to the league. Typically, the DPs have been older and more productive than non-DPs, making the age curve biased.
Let’s take a look at the standardized g+. Age is calculated as of February 1st of the year of the matching season.
19 represents players aged 19 and below, 34 represents players aged 34 and above due to the small sample size of these age groups
Instead of an expected parabolic curve with performance peaking between 25 and 28, the chart looks similar to an inverse exponential distribution. Players improve their performance significantly between 19 and 26, but the performance seems to plateau. This is unexpected. Casual sports fans may expect players’ impact to decrease on average as players age. There are three things that could be affecting this outcome:
i) DPs are older on average than non-DPs and tend to perform better.
ii) Older players that perform better during a season are more likely to not retire the next season. Players leaving the pool will skew the results. This is an example of selection bias.
iii) This chart measures g+ per 96 minutes, so older players may perform better in shorter stints but not be able to play as many minutes overall.
Now, let’s compare age charts for DP and non-DP players.
Some interesting things jump out immediately:
On average, DPs significantly outperform non-DP players.
The results of the DP chart are more volatile due to a smaller sample size (a team can only have up to three designated players).
The smoothed curve for non-DPs more closely matches the expected age curve.
The range of g+ values by age stays between -.02 and 0 g+ per 96 minutes, but there does seem to be a peak at ages 26 and 27.
The DP age curve has peaks at age 24 and 29.
There is significant growth between the ages of 23 and 25 while there is a drop off in production at age 25.
The average g+ for DPs starts to increase again at age 30 and fluctuates after but seems to increase slightly as players age.
Excluding the ages of 19-21, it seems that DPs aged between 25 and 27 have the lowest g+ on average. This is unexpected because the perceived peak of players is between 25 and 27 years old. My hypothesis is that younger DPs transfer to bigger leagues before they turn 25, and teams buy DPs that are closer to being past their perceived peak.
Certain positions and playing styles rely heavily on physical attributes (Speed and Acceleration for example). These positions should theoretically peak at an earlier age as these attributes typically decline with age. Using position classification from fbref.com, Figure 2 displays the smoothed relationship between g+ and age. Fig 2 only includes non-DP seasons. Because most DPs are forwards or midfielders, the data for defenders is a little messy. Therefore, I believe it would be more helpful for the purposes of this report to only incorporate non-DPs.
Of note:
As expected, forwards tend to produce at a higher quality between 24 and 27 years old, then drop off around 28. There seems to be another peak at 33+, but I believe this is due to the aforementioned selection bias.
Midfielders seem to peak between the ages of 26 and 29 and plateau after that.
Defenders peak a little later in their career, between the ages of 29 and 32.
There does not seem to be a strong relationship between g+ and age for defenders.
Luckily, for older players, soccer is not just a physical game, but also a game that can be won with skill or savvy. ASA’s g+ is broken down into six action categories: shooting, receiving, passing, dribbling, interrupting, and fouling.
Passing seems to age like a fine wine. The more experience a player has, the better their ability to pick out the most advantageous pass increases. While speed and quickness may deteriorate with age, recognition and vision on the pitch improve. No other action category has as strong of a trend as passing. Dribbling peaks between the ages of 21 and 24, and receiving peaks a little later in a player’s career between the ages of 25 and 29. Dribbling requires quick bursts and agility from the dribbler which is usually a stronger attribute for younger players. Another hypothesis for dribbling peaking earlier could be that younger players are more willing to dribble but rely more on passing as that skill develops. Receiving ability comes from making smart runs (which takes experience), and running into space with speed (which may favor youth). This would explain why receiving ability peaks between the ages of 25 and 29.
In soccer, players transfer in and out of the league constantly, which complicates the age curve. The player pool is changing every year, so the base skill level is subsequently changing each year as well. In an ideal world (from a data perspective), all players would stay with their same team and play the same number of minutes as they age each year. Since MLS teams do not care about ideal statistical situations, an adjustment to priors will need to be made. Thus, I filtered the dataset to only include players who had played in consecutive seasons. This will allow us to better isolate the impact age has on g+. This decreases the number of observations from 3,532 to 2,041.
The largest improvements of g+ by age occur before the age of 22.
The largest season to season decrease is for players aged 22 which is unexpected.
Between the ages of 23 and 29, players’ performance seems to fluctuate year to year. These players may have hit their peak performance. Other factors may affect their play like team situation, fitness, or luck.
There is a large increase in g+ at age 30, then a steady decline in g+ after that. Earlier in the article, it looked like players were improving after the age of 30 but this confirms that the prior improvement was most likely due to selection bias.
How do the various action categories change by age? Figure 5 breaks down the impact of each action category.
Dribbling and passing are the two action categories that are driving the early development of players between the ages of 19 and 21.
Once players reach the age of 32, it seems that the steepest declines are for the passing and intercepting categories.
Finally, it is important to discuss regression to the mean. In Michael Lewis’ “The Un-Doing Project”, there is a story about Daniel Kahneman’s time consulting for the Israeli Air Force. When student pilots had a bad flight, the officer would yell at them, and then the pilots would improve on average for their next flight. When pilots had a good flight and the officer would praise them, the pilots would perform worse on average for their next flight. Thus, the instructors assumed that yelling at the students would improve their performance. Kahneman realized that natural variation in performance was a cause of the regression as opposed to the yelling. Clearly understanding the concept of regression to the mean will allow us to better understand correlation versus causation.
Soccer is a high variance game due to its low scoring nature. Compared to the relatively small number of goals and assists, g+ may be more consistent year to year due to the large number of actions that are incorporated. But, do players regress to the mean over time like the fighter pilots?
The x-axis of Figure 6 is the percentile rank of g+ per 96 for season n and the y-axis is the percentile rank of g+ per 96 for season n+1. Overall, Figure 6 shows that there does seem to be an aspect of regression to the mean. While players who perform worse the year before still perform worse on average the next season, these players regress closer to the mean the next season. A similar effect occurs for players who perform higher than the median output. The biggest take-away from this figure is the 75th percentile group and above. There is a significant increase to the slope of the smoothed fitted model. This tells us that the top 25% of performers in the league tend to repeat these top performances. G+ seems to be stickier for players that perform at the highest level. The performance of non-top performers tends to have more variance.
This was just an exploratory data analysis of the relationship between age and g+. To create a proper age curve, a more robust model would need to be built that can take all these factors into account.