This is the Very Model of a Modern Major Soccer League -
/Our model gives LAFC a 65% chance to win MLS Cup, which is admittedly an absurdly high figure. Such a figure requires that LAFC have, on average, greater than 85% chances of winning each of three games on their way to the championship. Despite getting all three of those games at home, 85% still feels almost impossible against some good teams. So I’m here to break down what factors are giving LAFC such good chances in our model, and why that model is “wrong.”
Let’s do it like this. LAFC is most likely to play Minnesota in the second round, a team that adequately represents the caliber of a typical MLS playoff team that LAFC will face. For reference, Minnesota was second in the conference in goal differential (GD) and third in the conference in expected goal differential (xGD). I’ll start as though Minnesota were playing itself on a neutral field, and then I’ll layer in the various factors that make LAFC different, and that get us to more than 85% chances of winning a knockout matchup.
Home Field Advantage
Minnesota would be expected to score about 1.47 goals per 96 (i.e. per game) minutes against itself on a neutral field. But with home field advantage, the mighty Home Loons would be expected to score 0.63 more goals per 96 minutes than the Away Loons. If that sounds high, then you haven’t been paying attention to MLS home field advantage, the largest home field advantage of all major US sports by far.* How does that 0.63 projected goal differential affect win probability? It shoots the Home Loons up to 66% win probability.
Attacking
LAFC has a historically good offense. Our app’s team tables show that LAFC’s 2.00 xGF per game beat out Atlanta’s 2018 output of 1.88 for the top spot since we started keeping data in 2011. And then their 2.47 GF obliterated Toronto’s 2017 mark of 2.09. Though expected goals are a greater predictor of future wins, the gap between GF and xGF is still somewhat predictive over a full season at the team level.
As we start converting the Home Loons into the Home Black and Gold, we upgrade their xGF from 1.37 and their GF from 1.47 to those crazy LAFC figures from above. This pushes the projected goal differential of the game from 0.63 to 1.23 to 1.82, corresponding win probability increases from 66% to 76% to 84%.
Defending
Because of such a good offense, it may be that LAFC’s defense is underrated. They had the lowest xGA in 2019, and 10th-lowest of all time (well, since 2011). Correspondingly, LAFC had the best GA in 2019, and the 7th-best GA since 2011. Loyal readers, I can’t emphasize enough how extremely well this team has played. Swapping out Home Minnesota’s xGA and GA for those of LAFC bumps the projected goal differential from 1.82 to 1.94 to 1.97, and the win probabilities from 84% to 86% to 87%.
Let’s Review
Those are all the effects in the model currently employed on the app. Below I’ve included a summary table with the size of each effect (cumulatively). I added the effects in the order of importance, according to impact on projected goal differential in the match. The odds ratio is just a different way to quantify the size of those effects (it correlates to the projected GD differences pretty closely).
Effect | Projected GD | Win% | Odds Ratio |
---|---|---|---|
Home Field | 0.63 | 66% | 1.94 |
xGF | 1.23 | 76% | 1.63 |
GF | 1.82 | 84% | 1.66 |
xGA | 1.94 | 86% | 1.17 |
GA | 1.97 | 87% | 1.04 |
So… What Gives?
I said the model was “wrong.” I put “wrong” in quotes because George Box. Don’t worry about it.
The model is set up as a combination of two independent Poisson models to project the goals scored for each of the home and away teams.** There’s a thing about Poisson models that’s screwing something up: their linear equations predict the natural logarithm of the outcome, not the outcome itself. Mathematically, this means that the projected goal differential increases exponentially with changes to things like xGF, GF, xGA, and GA. Also, with small sample sizes it’s smart to penalize the coefficients (shrink them a bit toward zero), to combat all the model tinkering I’ve been doing with the same dataset over the years.
To “fix” it, I can think of a few options, which could be implemented together.
1. I could cap all observations of extreme metrics at, say, their 99th percentiles so that LAFC games this year don’t have an outsized impact on the model.
2. I could introduce non-linear functions to model tuning, capturing the diminishing returns of extreme metrics on outcomes.
3. I could introduce penalization into the model, shrinking the effects a bit toward zero because our sample sizes are pretty small. The benefit of this is that I already made such models earlier this year when I was trying to find more predictors of future outcomes.
Wait, but didn’t we already implement that? I noted in the article that we would use these new models earlier this year, but I eventually chose not to because I wanted to do more testing. Consider this round 1 of testing.
Following through with option 2, I performed the same analysis as above, but with the new models. The end result is that LAFC would have a shade over 80% chances to beat Minnesota. The probability of beating Minnesota-like teams three straights time to take home MLS Cup is about 50%, more in line with FiveThirtyEights’s model, and closer to the betting lines, which sit around 35%.
*In MLS, home teams win about 50% of their matches and lose 25%. Put one way, they win 67% of all matches that aren’t drawn; put another way, they take 1.75 of 2.5 points each match (70%). Either way, that’s a lot more than the next most home-friendly professional sport, the NBA, in which about 60% of home teams win.
**Intuitively, the home team’s projected total is based on its own season scoring rates (xGF and GF) and the away team’s season defending rates (xGA and GA). The away team’s projected total is based on the symmetric opposites of those (away xGF and GF, home xGA and GA). More info can be found here.