Goals Added: Recap and Roundtable
/ASA Roundtable: g+ Edition
Hosted & Edited by Alex Bartiromo
Alex Bartiromo
Contributing Writer/Editor
Hi everyone, welcome to our 2020 ASA Goals Added coronavirus roundtable spectacular!! In this roundtable, you will have a chance to read about the hottest new innovation in the analytics world, already spawning imitators across the world before it’s even rolled out. But what is Goals Added (g+) and what can it teach us about the beautiful game that past metrics weren’t able to? To discuss this topic, we have some of the world’s foremost experts, the people who built the dang thing. Welcome, everybody!
Let’s start from the beginning. Readers can read about the g+ model and what it measures. What are the origins of this project and why did you start working on it? I read somewhere that it has been in the works for five years! Has it evolved over that period? Are you asking the same questions you were then? What does g+ allow us to do that we weren’t able to before?
Matthias Kullowatz
Data Scientist, ASA Professor & Co-Founder, g+ Creator
Sarah Rudd's article on Markov chains got me thinking about value like 7 years ago. I recall Harrison distinctly bringing it up on a call or podcast, probably in 2013. We discussed the merits of Markov chains, and obviously concluded we didn't have the data to do anything. We just did shots and key passes, and then all passes, and now finally all events. This was ~2014.
The real work started about 6 months ago. It had been an itch for all that time, and I got extra help from Tyler and Rory on coding stuff, so it freed me up to start working on it.
The whole reason for doing it is that I wanted to understand who good soccer players really were, the same way that stats like WAR helped us understand who good baseball players were, or EPV with basketball.
Kieran Doyle
Contributing Writer, g+ Contributor
To be fair to others, a few other people had tried doing similar-ish things in the public sphere, and I'm fairly confident a handful of other clubs with well-established analysis departments had broached the topic.
And like Matty said with WAR, EPV in basketball has been a thing for what feels like kinda a while.
But the timing works out so well because the other "easier" avenues of looking at shots, then shot assists, have kinda stagnated, there's not a tonne [Kieran has kindly requested that his spelling not be submitted to debased American standards] more to do there.
G+ is a lot more holistic, so in theory, you can look at all these things we did for shots (who gets them, how did they get them, what did they do with them, etc.) and do the same for literally everything else. It's pretty cool.
Alex Bartiromo
Cool. I am a big fan of baseball analytics and WAR myself, so this is definitely something I've been dying for. I'm interested, how do the difficulties in creating a stat like this in soccer differ from basketball or baseball?
Matthias Kullowatz
Baseball can be broken into pitches pretty easily. Like, the most granular aspect of the game is sitting there all by itself. In soccer, what is the most granular aspect of the game? It's not well defined. So we had to define the granularity of an "observation." Even basketball has really obvious possessions.
Kieran Doyle
I think one of the weird things about comparing to basketball is it's a little bit more obvious when good things happen in basketball, just because of the number of events in a possession before a shot. It's a lot easier to say a pass or a screen directly impacted this possession when there aren't 27 events in the chain.
Alex Bartiromo
To return to the second half of the question, how have the questions you've been asking/answering changed over the last 6 months? Were there any huge surprises in what you found?
Matthias Kullowatz
Over the last 6 months, a lot of specific things changed, but I don't think the main concept changed: measure possession value, take the difference to get action value. Specific things included how to deal with defensive actions, pass receipts, shots, and penalties (all mentioned in the methodology article).
I think there were specific player surprises. I mostly wanted to see Diego Chara and Darlington Nagbe in the top 10. I would say Nagbe was a player who got me thinking about what a holistic player value might say, back when I watched soccer 26 years ago.
Kieran Doyle
To address what has changed in 6 months, a lot. I only popped in here like 4 months ago and I think one of the recurring themes was how do we make decisions on certain aspects of the model like (xPass and receiving, or turnovers) that represent the game appropriately, without being arbitrary and hand-wavy.
Tiotal Football
Contributing Writer, g+ Contributor
First of all, as a reminder, I think it is perhaps brutally important to zoom out to the full player evaluation process, as it was really the impetus for Matthias's dream here. And I think early on we were trying to lay down some basic rules as to how the debates and the thought experiments would shape up. I think we largely settled on three steps:
1. Build a really good engine that can estimate or derive the value of every recorded action, and to do that we needed:
a) to decide on the unit of value an action would take, which we decided for many soccer reasons should be the probability of scoring minus the probability of conceding over a two-possession pair (one for you, and one for your opponent). My findings were that in MLS the more possessions you considered, the less you were likely to figure out about the individual actions and more about which team was the home team, since home advantage is God in MLS.
b) to decide on the unit of account to string actions together through time in order for the engine to learn what sorts of things recorded in the data influenced the probability of scoring minus the probability of conceding over a two-possession pair. We largely decided on the possession definition ASA had been using in its goal chain work.
2. Having done the hard part of building that fancy mathy engine, and somehow trying to validate it, we needed to decide on how to attribute (or "allocate") the value we just derived for all of the actions to the players themselves. And this is a step that I think a lot of people breeze past because so much hard work is accomplished in #1 above - the instinct is just to grip it and rip it at this point and check the results (which we did). But once you create a fancy, amazing (perhaps expensive) model to value the individual actions, you have to think long and hard about which players are responsible for those actions, because the model doesn't tell you that (yet). When it came to the value of passes, I think we discovered that we had a fundamental disagreement with many others out there about how soccer works, which is kind of exciting.
3. Having allocated out (#2) the values of the actions derived by the model (#1), were we left with results that were still tainted by the Valley of Meh (thanks Thom Lawrence): the fact that consequences of individual actions climb dramatically at opposite ends of the field compared to the middle so that even if you had a perfect model, your top attacking player would always score better than your top midfielder just by how the thing works. And we decided for now, until a more elegant solution comes to us in dreams, to compare each player to his positional average or the positional replacement value.
Alex Bartiromo
Kieran’s article talks about the similarities and differences between g+ and some of the other possession value models, such as Opta’s PV, Jan van Haaren’s VAEP, and Bornn et al.’s EPV.. How does g+ build off of other existing models, and how does it differ? Discuss some of the methodological differences between them, and how we came to some of the decisions we did. Did you have any difficulties in valuing certain types of actions over others? What about the thorny issue of off-ball actions? Are there any things that simply can’t be measured with existing data? What did we learn from the validation you did using Nicolás Lodeiro (this topic is discussed at length here)?
Kieran Doyle
Yeah I think it's really easy to look at all the different PV type models and, without looking under the hood, see them as quite similar. But I think the 3 big, big differences are shots, receipts, and turnovers.
John Muller
Contributing Writer, g+ Contributor
I think g+ moves the ball forward—or at least does things a little differently from what we know about other public-ish models—in four main ways. First, the model inputs are different: it looks at actual possessions, not some set number of actions, and it has some information about every action in that possession, not just a few recent ones. Second, it awards value for receiving a pass, which turns out to be super important. Third, it values shooters for contributions other than finishing, which is too noisy to be useful. And finally, it helps solve a problem that some of these models have run into with turnovers.
Tiotal Football
To avoid confusion, one way to frame this is that, turnover or no turnover, the value of an individual action in the g+ model is measured as the change in the probability of a team scoring minus the probability of their opponent scoring on their next possession that the action carries from the old situation to the new situation. Because a turnover is an action(s) that bridges possessions, to calculate the value of that transition action and be consistent with the two possession framework, you need to know several different possession values - the probability of scoring that existed in the situation that is now gone because you tackled me; the probability that existed before you tackled me and is now gone that you were going to score on your next possession; the new probability that you are going to score now that you actually have the ball based on the new situation; and the probability that I am going to score when I ultimately get the ball back following your new possession (whenever that may be).
It is not a magic trick, although it may feel like it. It's just following a model that presumes there is always at least this possession and then one more to follow. A fact that is true in 98+ out of 100 possessions a game. This is something that Matty and I debated at length in the dead of night. As you can imagine, while it feels simple once you "learn it”, it's easy to get lost in that whirlwind.
Kieran Doyle
To address the off-ball movement question, I think having receipts does a lot of the heavy lifting there in seeing who can find space to actually get on the ball, but to be comprehensive we will eventually need Second Spectrum to bless the rains down in Matty's google drive.
Like the Friends of Tracking stuff is amazing and you can already see the huge benefits of space creation and denial already, and that's part of why I kinda push EPV from Bornn and Javi to the side—it's really hard to infer a lot of the off-ball things in event data without making subjective decisions we weren't quite comfortable with.
Matthias Kullowatz
Yeah, offensively I agree with Kieran there. Defensive off the ball is completely different, though, I think. I'm not sure we're lifting anything there.
Kieran Doyle
Yeah, I think I'm pretty all-in on defensive off-ball stuff (and acknowledging the existence of goalkeepers) being the biggest spot for growth.
Rory Pulvino
Contributing Writer
I guess I was thinking of the defensive metrics as being team-based cause yeah, I can't really envision how to do it individually - though maybe if there was some expectation of g+ per offensive player going into a game and then seeing how close they were to that and then crediting the nominal defending player with some portion of that
Alex Bartiromo
I want to take a moment to look at the visual presentation side of things (readers can find an in-depth article on the topic here). What were you going for with the wheels (not to mention the bee swarms)? What were some editorial choices you made? Talk about how the visualizations evolved over time? How do the g+ visualizations help readers better understand the model? What are some aspects of data visualization you feel the analytics world could improve on in general (and how do you try to do that)?
Kieran Doyle
God, they're so pretty.
Eliot McKinley
Contributing Writer, Data Visualizations, g+ Contributor
We really wanted something that looked different than pretty much anything you’ve ever seen for something like this. We had a framework that was stolen from baseball that I previously used to look at how well players passed in certain directions that we were able to build upon here. It was really an iterative process, mostly between John and me to begin with, which is pretty well chronicled in the article. Once we got something that we thought needed some broader feedback we’d come back to the broader group.
One thing I love about the wheels is that in a quick look you can get a sense of “is this player any good compared to an average player and what aspects of the game are they good or bad at?” You can see trends of types of players, like the aging DP that has a huge passing score but really low receiving, probably because they don’t want to put the effort into moving around too much to receive dangerous passes.
Also, the wheels are different than things like radars or bar charts for a couple of reasons. They are “all-in”, meaning that all of a player’s actions are contained in the plots. And they are all in the same units of goals added. With a radar or bar chart, you inevitably make some choices about what stats are important for a certain positional profile. Like, is Box Cross% one of the most important dozen or so metrics for a winger? I really don’t know.
Drew Olsen
ASA Editor-in-Chief & Co-Founder
It is not.
Eliot McKinley
And with radars and bars you have different scales for all of the values. You can normalize with percentiles but you still may end up giving more weight to a stat that really doesn’t mean as much. For example, fouls, which are probably not driving overall player value. You could have a bar or radial at 99% fouls which may visually overwhelm something more important like xA or progressive passing. With the wheels you get that fouling is not the driver because it hardly ever deviates from zero, whereas things like passing, receiving and shooting will.
Also, they look fucking cool because we spent a ton of time making them that way.
Kieran Doyle
I also think there were a lot of aesthetic choices made that were both A) beautiful and B) really effective at conveying their message (thickening the middle bar, cross-hatching negative values)
John Muller
Last year we tweeted a lot of percentile bar charts, which were fun and pretty easy to grasp but misleading for the reason Eliot said. If you didn't already have a good frame of reference for, say, what the distribution looks like for defensive midfielder dribbles, you might think it was really important that a guy was at the 75th percentile instead of the 50th, and then it turns out the difference is like 0.1 dribbles per game or something. But even if you know what the numbers look like, does that get you to importance? How much does a dribble matter and why? How does that change depending on when and where and under what circumstances it happens? The wheel answers all those questions by translating everything into goals added, even if you don't realize that it's doing it.
Alex Bartiromo
Talk about the beeswarms. They’re a subtle touch, but they help you understand the context of X player's numbers.
Eliot McKinley
The swarms were the last thing we added because we wanted to have a visual cue for “is this value good amongst the player’s peers?” Like if you see 0.10 g+ per 96—is that good? And you look at the swarm and see that yeah, this player is in like the 95%ile with a g+avg like that.
I also like the subtlety of them. We don’t label them or anything, but if you know what they are then they really tie the whole viz together.
Drew Olsen
For a relatively complex metric, the beauty of the viz is that it makes this complex algorithm that I will never completely understand into an easily comprehensible image. In that little wheel is literally EVERY on-ball action that player took in a season. Occam would be proud of its simplicity.
Kieran Doyle
I think it's cool how much information it has without really being overwhelming?
Eliot McKinley
And then we’ve taken that aesthetic to other g+ vizzes that are in the other articles.
I need to credit John for much of the aesthetic. He'd ask if something could be done or send me a mock-up of what he was thinking may work and I’d try it out. Sometimes it didn’t, but much of the time it made it much better.
Changing the font also made a huge difference.
Kieran Doyle
It took me like 45 minutes to figure out how to add fonts to the mac font library because John wanted it to look like a USSF match report.
John Muller
That's not true, I pretty much just filtered my cheap knockoff Photoshop's cheap knockoff typefaces to san serif and picked the first one that looked cool. USSF must be too cheap to buy licensed image editing software too.
Alex Bartiromo
Ok, last big question guys. How can we use g+ to do new analytics work? What do you think still needs to happen going forward to improve the model? Talk about g+boost and xComp. What are those, and how could they improve on what we already have? What information could help us build even better models? How do we figure out defense? What are some ideas you guys had but abandoned in the end? What are some questions you guys still have that haven’t been answered, and what are the next steps to addressing them?
John Muller
We've spent so much of the last six months trying to get goals added right that we've almost ignored the really exciting part, which is what we're going to do with the model now that it's out there. We've used games to help us understand the model, but we've barely scratched the surface of using the model to help us understand games.
I'll use g+boost as an example since that's something I'm excited to play with. The idea is that instead of just looking at who creates a bunch of goals added, which we can tell at a glance from Eliot's sexy wheel vizzes, we go a step further and start figuring out who helps the players they pass to accumulate more g+ than usual in the next action in the chain. That's g+boost. If you can get the metric right, you've basically found a way to measure vision and decision making, which is pretty wild.
That might be especially important in midfield, which Statsbomb's Thom Lawrence famously called the Valley of Meh for the difficulty of racking up possession value there. A guy like Darlington Nagbe doesn't spend a lot of time near the goal, where if you're good you can help your team add a lot of g+ quickly. His job is to connect possessions from back to front in an intelligent way. So Nagbe's g+ doesn't look special, but his g+boost is consistently outstanding, which might shed some light on why MLS teams keep paying bajillions of Garber Bucks for him. And when we can identify players like that, we can go to the tape and study what they're doing differently to improve their team's possessions, and that just might teach us something about soccer.
Tiotal Football
That's a great point, John. I think it’s consistent with other decisions we've made with g+ that stay true to the overall soccer principle we often breeze past too often simply because data exist(s), which is that you can have all the event data in the world, but it does not make it true that these actions can or should be assigned easily to individual players. Obviously, soccer is a team sport and a fluid one at that. Everyone knows this but when we decide to do some data modelling, we are seduced by the simplicity of data that is explicitly mapped to each player.
The achievement that Matthias has created here in building a model that derives the values of soccer actions (regardless of who is involved) is just the first step. And your idea of g+ boost harnesses these values in a way that goes beyond simply mapping said values to the players on the ball (which we should also do!). We are pretty confident that this is how soccer works— that a player can have an influence on other players actions—so why wouldn't we explore it?
John Muller
I mention this in the Lodeiro article, but one thing I noticed right away when I started asking video analysts to pick out good "plays" was that they naturally identified sequences that included multiple actions, or events, in the data. Sometimes those events spanned multiple possessions, so that the same player recorded offensive and defensive actions over the course of a single play. And that actually works pretty well with goals added. The model gives us a standard goal unit to compare and combine different action types, and it's built on the understanding that each action has consequences for both teams that don't just vanish when the ball is turned over. Our usual analytics tools encourage us to treat every action as its own discrete thing, but I think you're right that this model helps align our metrics with the fluid way soccer actually happens, which I hope will improve our analytics work from the ground up in a conceptual, first-principles sort of way.
So what about you, now that the scales have fallen from your eyes and stats and soccer are melting together in the flux of eternity, what questions are you excited to work on that you might not have been able to measure the right way before?
Alex Bartiromo
What about some of the other questions? What ideas did we end up scrapping? What about xComp? And how are we gonna solve defense?
Matthias Kullowatz
xComp is still in the works, for sure. From my baseball days, I am a big fan of value above replacement as the unit of value that makes sense to tie to compensation. Essentially, replacement sets the bar almost low enough so that if you walk on the field and continue to breathe, you'll score 0. This is a nice baseline for salary, I think, because the lowest paid players don't walk on the field, and are about as valuable as those that do walk on the field, but mostly just breathe.
We have some draft versions of models that correlate actual compensation to g+r (goals added above replacement), in an effort to determine the $ value a player provided in his time on the field.
Ryan Anderson
MLS Fantasy Stats
Having not yet finished mourning the Green Bay Packers' miserable selections in the NFL Draft, I would be fascinated to use g+ to explore the value of the MLS SuperDraft. Teams are taking such opposite responses to the draft that I think there can be significant room to grow in our understanding of its value and proper strategies for going after that value.
Positional return in the NFL is massive (quarterback is by far the most valuable, followed by cornerback, offensive tackle, and pass rusher), whereas in soccer, the return may not be terribly different between positions. But perhaps we could see which positions are more likely to pay off in the draft, and it would be interesting to see the drop-off in average g+ and/or g+boost by draft selection.
This would also help us quantify the value of a draft pick better: combining with xComp and salary "cap" considerations, we could see what is the appropriate trade value of a draft pick, and who got the better end of the deal when Philadelphia traded away their entire 2019 draft to FC Cincinnati for $200K in Garber Bucks.
Now that I mention trades, g+ gives us the potential of creating a trade win likelihood. Pro Football Focus has done this for the NFL, but I think it would be feasible to use machine learning and/or simulation to predict who got the better end of intra-league trade deals (inter-league deals would be too hard I think) in MLS.
Kieran Doyle
In terms of solving defense, I think we're still a ways away, but we're getting there. We can already tell the difference between good and bad clearances, and center backs who get beat up on bad teams might get more opportunities to clear and tackle etc., but their interrupting g+ values aren't crazy boosted by that. We still see the best CB's turn up as good possession openers on good teams who defend well as a team. That's a start. The next steps for me are looking at punishments for actions that don't happen to really suss that out even more. We've played with the idea of g+ penalties when actions happen in a player’s region of influence, but without tracking data I think it's really hard to find something we're comfortable with.
Ryan, I love that idea. I remember getting really happy about the fact that g+r has a legitimate real replacement level in actually having a draft, where it's a player pool everyone has access too (in comparison to the difference in academy qualities), and you can actually see what that looks like.
And if we extend that out to the draft, we can see clearly some positions draft really well. Gressel is a drafted fullback, Laryea is a drafted fullback (then an attacking fullback), Aaron Long was drafted, Miles Robinson was drafted.
But we also see Abu Danladi and Jonathan Lewis and all these other attackers who find it really hard to settle as high value picks, given where money gets spent and all those other things, and g+ might give us a lot of insight there.
I know if you compare the g+ replacement level by position there is big deviance. Replacement level GKs are bad bad, so it might be something to think about.
Tiotal Football
So the thing I would say about defense is that if we go back to the 3 steps of 1) build the model to value the actions, 2) allocate the values to players, and 3) balance for unequal opportunities, then I think it's worth stressing that at the team level (and even at the macro "what is soccer?” level) we should be able to do a lot with defense simply with the results of step 1. We have a model that tells us at the end of each action a derived value for the team with the ball's likelihood of scoring, and that's valuable because we can see which teams are conceding how much g+ and when and where (and how?). And frankly, when people talk about unequal opportunities to accumulate defensive actions, I sense we are mostly talking about team effects. "Players on bad teams defend a lot." So we know some pretty good stuff about g+ conceded at the team level, and we know some pretty good stuff about which players are on which teams, and we know which players are disrupting possessions. We also can derive how much of the g+ a team concedes is really conceded when (and where) they turn the ball over to begin with. So I'll pause there, cuz I suspect we're closer to the finish line than we give ourselves credit for - if you define the finish line as a reasonable way to burden a player's interrupting g+ value with the value he and his team conceded to the opponent before it is extinguished via interruption or shot. We're just left with that pesky step 2, how to allocate the g+ values a team concedes to the individual players. But again, "receptions" showed us that allocating the possession values to players isn't even straightforward in attack. So like, this feels doable. Thoughts?
Matthias Kullowatz
Could a simple version of this just be to add up every team's defensive action value (in a game/season), multiply by a factor to make all teams equal (on a game/season basis), then use that factor to adjust each player on that team?
Tiotal Football
That's right, I think there are some basic "do no harm" cuts to this that we can apply and get some useful balanced metrics out of. I am also very interested to study whether a team that is succeeding in preventing shots and shot xG but conceding plenty of g+ activity can persist in preventing goals or if (kinda like goal conversion rates) xG conceded relative to g+ conceded reverts to the mean over time.
Matthias Kullowatz
On the topic of things to do in the future: can we translate goals added into points added? I don't mean just regress points on goals and make a linear transformation. I mean like can we adjust the methodology to pick up on the advantage of dribbling the ball into the corner when you're up by one in stoppage time? We have a win probability that takes the gamestate and time into account to determine the probability of winning, so I think this would require some sort of tweaking to the methodology that includes a time horizon. So, it wouldn’t just say that the EPV is +0.015, but also show that it would take your opponent some amount of time to score.
In the meantime, we might consider a tweak where we don't score plays in the last X minutes of a game, or something.
John Muller
Those points added values would go haywire at the end of a game, right? Especially in MLS, where nobody's even pretending to play an organized sport for the last twenty minutes. If we use g+ for nothing else, let's at least figure out a way to show this much: the only good soccer happens in minutes 1-30 and 45-60, and points are dumb.
Matthias Kullowatz
Haha, well I'm optimistic the xpoints wouldn't go haywire. Basically it would just re-weight the value of a goal at any given time, and the value of wasting time. When you're up 1, a goal is worth less than when you're tied. The actual risk-reward calculation is different, and if we feed the algorithm changes to xpoints rather than changes to xgoals, maybe it figures it out on its own.
Unrelated thought...can we tease out what home teams are doing better to win games with our EPVs and g+ values right now? Could we, like, identify what home teams are doing and where they are doing them that increase their g+? Seems possible.
John Muller
What would analyzing home advantage with g+ look like? Every time I try to think about home advantage I give up, because how do you even start to sort out causes and effects?
Matthias Kullowatz
Well we believe a large portion is fouls. So, freeze play in a type of game situation and then measure the component-based g+ earned the rest of the possession. Is it largely foul value?
(On average.)
John Muller
One thing goals added has driven home for me is how little fouls seem to matter (aside from penalties, obviously). Or if they do matter, it might be in terms of things like rhythm and momentum that I imagine are hard for the model to capture. So I wonder if maybe goals added has already helped us a little on home advantage by nudging us away from ref-centric explanations that make more sense in other sports and toward tactical ones.
Tiotal Football
That's really fascinating. At some point I bought into the belief that home advantage is mostly foul calls or the lack thereof, with MLS' extreme home advantage the result of parity leverage. But it's a great point. If the model doesn’t find fouls to be driving swings in goals added, there's something there to work on
Alex Bartiromo
Alright guys, this has been a great conversation with some real meat to it, thanks for participating. If anyone has any final thoughts, please post them here.
Drew Olsen
I think this is the right framework for what we have, namely on-ball data. Surely we'll continue to improve/change g+, but I'm pretty sure this is the right framework for thinking about field players. The major missing link, and the missing link in soccer data in general, is off-ball data. It's the elephant (not) in the room. It's really hard to find and acquire any individual data - not summary data - for on-ball actions. It's damn near impossible to get player tracking data for all 22 players unless you work for a top level pro team or league. This is a drum we've been beating at ASA for many years now, but soccer analysis suffers from the lack of free publicly available data.
Kieran Doyle
For sure, and we've already seen the crazy amount of insight generated from Friends of Tracking and that whole initiative.
My final thought is that this is a really great start and the introduction of receiving value (and thus how we think about the passing receiving allocation) and turnovers are genuinely different and new things that push the field forward. But as you can see from how quickly these conversations devolve into 400 branches off of the main discussion, it's just the start. I think there are some really exciting applications like xComp and draft value and style classification and archetypes and so many others, and I'm excited to see how applying this to problems gives us (Matthias' magical coding fingers) guidance moving forward.