Where Goals Come From: Using past goals to create future goals
/By Jamon Moore and Carl Carpenter
This is the second article of Season Two and ninth overall article in a series of articles and videos in the Where Goals Come From project from Jamon Moore and Carl Carpenter.
In our last article, we discussed how an effective season-long team strategy needs to come from a positive difference of three independent factors against your opponents: Goal Conversion Rate, total shots, and total possessions. And elite teams are always positive in at least two of them. In this article, we are going to look at the ways we can evaluate Goal Conversion Rate, including discussion of Expected Goals (xG).
You may have very strong opinions one way or the other about Expected Goals, but we are going to use them in a unique way here that will allow us to make xG actionable for front offices, coaching staffs, and even players on the pitch in future articles. We will also introduce other measurements that can be used for audiences that are not yet educated on the xG topic.
This article is intended for those with no- or low-familiarity with xG, or maybe see it as a negative thing that is encroaching on the sport. We aren’t here to present it as a panacea, but we do want to objectively discuss its utility for a club.
The outline for this article is going to be:
If you’ve heard about or looked at xG in the past but either 1) didn't see its utility or 2) didn't know how to make it useful, we want to help with these scenarios in this article and upcoming articles.
xG is always improving, so regardless of what you saw or read about a few years ago, it is much better now at evaluating individual shots because of better and more data.
Not all xG values from various sources are equal because there is not equal access to the data points and data volume, and because data providers, clubs, and analysts have varying ideas on how to value shots and optimize their models.
There are other stats and metrics that are not talked about as much as xG but can also be very useful in addition to or along with xG. Some may be better suited to your audience.
xG helps us answer the quality question about a shot, and we'll be talking about improving shot quality utilizing xG and other tools throughout this season. Without xG, shot quality becomes highly subjective and experiential.
It’s time for “The Talk”
(When a Mommy Goal and a Daddy Goal love each other very much, they start Expecting a Goal. It’s “where goals come from”....)
You can read 100 articles about why xG is important, and they say similar things: “Teams should use xG because after about 10 games it can better predict future goals than past goals can...blah blah blah bluh-blah.” This statement is true (with a decent xG model), but it’s not actionable for clubs and coaches, it just helps prove xG has some utility.
What we have not done an adequate job of as analysts is help coaches and players by connecting the math to the pitch. In Season Two of this series, we hope to help that happen, by making it digestible and actionable.
While there are easily 100 articles about xG, there are probably 1,000 xG models out there that will spit out a decimal value between 0 and 1 if you tell them some details about a shot. Multiply the decimal value by 100, add a percent sign, and, viola!, you get a percentage. The percentage is supposed to be helpful to tell you the chance that a goal could have been scored in that situation.
For more information on what an xG model is, see our explainer.
But the xG value of a shot can vary by xG model, sometimes significantly. What one xG model calls a 0.24 xG (24% chance) shot another might call 0.56 xG (56% chance). This is usually because the models have differences in the inputs they take and the way they were calibrated. Over the course of a full season, these differences should even out if the model is built properly (the technical term is “fitted”), and the xG values generated by the model should be a better indicator than past goals for predicting future goals.
The analysts are able to filter through these differences, but it’s difficult for coaches to take a single xG value or a set of xG values from a game, set of games, season, or several seasons, and understand the context behind them, much less use them to instruct players on how to behave differently. Some coaches incorrectly use xG from a game to say how they played better than the opponent despite the scoreline. (Pro tip: don’t do this.)
A New Day is Dawning for xG
If the data is accurate enough, xG models can provide a pretty realistic value for evaluating the conversion rate of a shot if it was taken 100 times under the same or very similar conditions. As I mentioned at the beginning, xG models are constantly improving. This helps us evaluate shots, and the situations that create them, better today than ever before, and tomorrow will be better than today.
Let’s compare the shot data incorporated into xG model A from 2017 and xG model B from 2021:
Typical xG Model B (2021) | |
---|---|
Distance to goal | |
Angle to goal | |
Key pass is Cross? | |
Key pass is Through Ball? | |
From Corner? | |
From Set Piece/Throw-in? | |
Direct freekick? | |
Headed? | |
Speed of play (meters/second) | |
Key pass is Long ball? | |
Key pass is Progressive? | |
Key pass ball height | |
Key pass start/end location | |
Shooter carry start/end location | |
1v1 with Goalkeeper? | |
Shot with preferred foot? | |
Positioning of Goalkeeper/Defenders | |
And many more factors... |
* To overcome model deficiencies in the mid-2010s most models used a highly-subjective “Big Chance” flag which had its value determined by individual match analysts. Many data scientists determined that the Big Chance flag favored goal-scoring chances and caused over-fitting to xG models. ASA has never used the Big Chance flag as a result.
Given the amount of shot information in 2021 from the top data providers to clubs, we can now feel much better about analyzing individual shots. Models will still have differences, but the differences between the better models from top data providers should converge more-and-more especially as more GPS and camera data is introduced for true 3D data views.
If you dismissed xG as a valuable tool even a couple of years back, I’d ask you to reconsider because it’s going to be very important to what we are sharing over the next few weeks.
In my own xG model--I’ll call it the “Where Goals Come From (WGCF) xG model”--I lack the positioning of defenders that exists in premium (paid) vendor models. So while I’m missing some factors that would help me feel a little bit better, all-in-all I have 42 key data points for almost 375,000 shots--enough for the analysis we want to do here. I’ve tested my model in a number of different ways to feel confident that we can use it to do analysis at a shot level that will benefit coaches and players.
Simplifying xG
xG is basically our Goal Conversion Rate based on previous shots and goals from the same location using as much data as possible about them. It takes about 5,000 shots to make an xG model. In my opinion, it takes about 25,000 to make a good one. We are working with hundreds of thousands of shots across several leagues.
We could add more shots or even build several league-specific models. We have way more than enough shots to cover the 42 data points and over 99% of the realistic shot locations on the pitch. Given the amount of data now available, the model accuracy is going to be higher than it ever has been before at the individual shot level.
Let’s show how this works. In Season One of this series, we shared the Goal Conversion Rates of various types of shots and shot locations using the 18 zones. Across all of our shots the Goal Conversion Rate is 9.875%. The average (mean) xG across all of our shots is 0.09808 (or 9.808%)--a difference of 0.0067. Even when looking across the individual 18 zones with more than 1000 shots, the difference between xG and Goal Conversion Rate is much less than 0.001.
So plugging in our 42 data points for each shot, we can effectively get the Expected Conversion Rate for each individual shot (xCR = xG expressed as a percentage). If an average professional player in the surveyed leagues shoots that same shot under the same circumstances 100 times the result will be a number of goals--this is quite literally the “Expected Goals” total. Players that shoot above this average will “over-perform” their xG / xCR, and players who shoot under this average will “under-perform” their xG / xCR. Good shooters can have bad stretches, and bad shooters can have good stretches--only time and a lot of shots will give us the reality.
Same thing, different way of saying it
American player Daryl Dike from MLS shocked everyone when he bagged nine goals in the English Championship in the second half of the season just after arriving at Barnsley. Let’s look at his numbers:
His Goal Conversion Rate was 26.5% on 34 shots (9 goals / 34 shots). Barnsley as a team averaged 10.5% for the season.
Infogol has an xG of 5.17 on those 34 shots (9 goals, 5 saved, 13 missed, 7 blocked).
His Expected Conversion Rate was 15.2% (5.17 xG / 34 shots).
He had seven shots blocked for 0.65 xG. Taking those blocked shots out, that means his “Fenwick” Goal Conversion Rate was 33.3% (9 goals / 27 shots), and his “Fenwick” Expected Conversion Rate was then 16.7% (4.52 xG / 27 shots).
Analytical term: “Fenwick” is a term borrowed from hockey, and used on occasion in soccer (salute, former Modern Fitba team), to indicate shots that were not blocked. We could go much deeper with Fenwick to look at game states, if we chose to.
So then:
His Goal Conversion Rate (26.5%) exceeded his Expected Conversion Rate (15.2%) by 74%.
His goals (9) exceeded his xG (5.17) by 3.83 goals and also by 74%.
He exceeded the “Fenwick” version of the same rates (33.%) and (16.7%) by 99%.
His goals (9) then also exceeded his unblocked xG (4.52) by 99%.
His shots had a 45% better chance of being scored than his average teammate’s shots (15.2% Expected Conversion Rate vs. 10.5% team average).
His Goal Conversion Rate (26.5%) also exceeded the team’s average (10.5%) by 252%!
That’s much more useful context for Barnsley or a prospective club to evaluate Daryl Dike than the contextless statement “he beat his xG by 3.83 goals”. Not only that, we can use the various shot types from the WGCF Framework to analyze Dike’s performance in various types of situations and identify areas for improvement. That gives a lot of additional context to Expected Goals that clubs can use!
Which of these numbers and explanations is easiest for you--an executive, coach, player, or analyst--to understand for comparing players and teams? That’s up to you, but it’s important that everyone in the club is communicating with each other the same way. xG has become popular with the analyst community, but it might not be the best method for a wider audience in a club meeting or a meeting with ownership without a lot more context.
Side note: Would Dike keep up this type of torrid pace if he continued at Barnsley or with another team in the English Championship next year? Maybe not, but with more proof points, it could be an indicator he’s ready for higher level competition, or it could be an indicator that a talented teammate is helping him exceed expectations by putting him in better positions to shoot. Or it could just be that he got really lucky over a short period of time.
To really answer questions about a player’s shooting performance, I recommend looking at longer term trends, such as a comparison of Goal Conversion Rate and Expected Goals per Shot (Expected Conversion Rate when expressed as a percentage) over time. If the lines have flattened for the last 100 shots or so, the player has likely reached their peak unless a new coach or setting can create further improvements. Gyasi Zardes, for example, has continually improved in MLS over his career while Kei Kamara, Maxi Urruti, and Dom Dwyer have long stabilized. This can help clubs identify players who are improving over time and are finishing at a consistently better rate than their xG or even players who are learning to get into good shooting positions.
Most shots are “bad” chances
Looking in our model, the median shot xG is actually 0.0526 or a 5.26% chance. Wait, what? Let’s cluster the shot xG values and see why this is the case.
What this tells us is that the biggest cluster of shots (0.02 or 2%) is one-fifth of the minimum conversion rate teams need to be successful according to our last article. 70% of shots are below the average non-penalty Goal Conversion Rate of 10%. It only stands to reason that teams need to minimize the number of shots that lead to low Goal Conversion Rates (lower xG) and increase the number of shots that lead to much better Goal Conversion Rates (higher xG).
Revisiting our Goal Conversion Rates by shot type from the previous article, we can see which types of shots are the most valuable above, starting with the very effective Through ball shot. But not all through balls are created equal. Each one does not have a 30% chance of being scored. What this visualization lacks is the spread of the quality for each type of shot.
This is better. xG helps us understand how a 30% Goal Conversion Rate on Through ball shots happens. The quality of Through ball type shots ranges from low xG to some shots with over a 50% chance. We need to identify how to create shots on the better end of the range. The median xG and average xG here are not for shots, but instead are for goals (as we saw earlier, the median xG is lower than the average value because of where the cluster is greatest). We want to understand how to create repetitive success (goals), not just attempts (shots).
As we will get into this season of the series, each type of shot either has “bad / good / better” versions (such as Normal and Free kick) or “good / better / best” versions (such as Through ball, Cutback, and Progressive) according to xG.
Making xG Actionable
The biggest issue with Expected Goals (xG) is that it is not clear to coaches how to make it actionable in training, other than to guide players to not take too many shots from outside the box and work toward a closer shot. In the next articles of Season Two in the Where Goals Come From series, we are going to help you make xG actionable--finally!
The focus of Season Two is to identify bad, good, better, and best patterns and situations that lead to higher Goal Conversion Rates and help players understand when and where they should look for a shot, and when they should keep looking. It can help coaches train for specific situations and outcomes to fine-tune game performance with better informed players. This information should also lead to new ideas on how to defend the most dangerous situations and force opponents to take shots that are less likely to be scored.
Conclusion
Shot xG has now improved in accuracy to the point where we can use it to say “do this, not that.” By understanding more how past goals were created, we can identify unlocking passes and patterns that are more likely to lead to better shots. We can use our database of hundreds of thousands of shots and passes leading to them to better identify what works occasionally and what works more often.
Teams need to decide how best to evaluate their performances, player expectations, and potential signings. Getting everyone speaking the same language in a club is critical. xG is one way, but there are others that may work better. However, xG can be used to tell us what works and what doesn’t (or at least what works less often).
Key points:
xG helps us define the quality of a shot or set of shots. Without xG, shot quality becomes highly subjective and experiential.
xG models are always improving and most are much better now at evaluating individual shots than they were even three to four years ago.
xG values from various sources won’t be equal. A club should either identify a reliable source of xG values for their shots, such as a premium data provider, or have their analysts build their own model to use so they know precisely what factors are used to evaluate shots.
There are other shot stats and metrics that are not talked about as much as xG but can also be very useful in addition to xG or include xG in their calculations.
If you have any questions about Expected Goals, reach out to us for a chat, but there’s a good chance we’ll answer your question in an upcoming article.
Acknowledgements
In building my own xG model for this project, it’s important to give a lot of credit to people and sources which helped me in the process, and for other content in this article.
First, I couldn’t have done an article like this without our own real data scientists Tyler Richardett and Matthias Kullowatz who improve and maintain all models used at American Soccer Analysis. They deserve a ton of credit for giving me the resources and support to create my first GLM model a few years back and the current xgboost model that I use and verify against the ASA model. Their help has been essential for the whole Where Goals Come From project.
I found a lot of xG model inspiration from many sources, but particularly from the blogs by Lars Maurath and Nils McKay, along with this amazing article on his own xG model from Michael Caley which demonstrated how far ahead he was of the rest of us and even incorporating some of these WGCF concepts a few years back.
This article from Modern Fitba was helpful to confirm my thoughts around Expected Conversion Rates and sent me down a rabbit hole about Fenwick-adjusted stats which are used more in hockey, but clearly could be quite useful to club analysts in soccer.
About Jamon Moore
Jamon is a high-technology industry executive overseeing business agility transformations. In addition to being a contributor for American Soccer Analysis, he is a credentialed media member covering the San Jose Earthquakes from mostly an analytical perspective. Jamon can be contacted via Twitter, and club analysts and executives can connect with Jamon on LinkedIn.
About Carlon Carpenter
Carlon is the current Tactical & Video Analyst for StatsBomb, one of the largest soccer data companies in Europe. Carlon also works as a contract employee for the U.S. Soccer youth national teams, working as a performance analyst for the U-17 men’s national team. Carlon can be contacted through his LinkedIn account or via Twitter.