Rolling out USL and NASL data in the web application
/Brought to you by Tyler Richardett (@TylerRichardett) and Matthias Kullowatz (@MattyAnselmo)
In an effort to continue to bring you the best American soccer analysis, we have added three new leagues to the web application: USL Championship (2017 - ), USL League One (2019 - ), and the now-defunct NASL (2016 – 17). These new leagues join MLS and the NWSL in our app, representing more than 4,000 players across nearly 6,000 games. In the app, you can select which league you want to view in the upper right, and then navigate through the tabs to explore shooting, passing, and goals added (g+) statistics for players and teams.
Perhaps you’re looking for the best USL player-season in the last few years? New Mexico’s 25-year-old, Santi Moar, produced 11.27 goals added above replacement (g+rep) in 2019 as a winger, followed closely that season by Red Bulls’ Jared Stroud, who turned 23 during that particular campaign. We’re currently working on some player projection models, a derivative of which will measure how g+ accumulation in the lower leagues translates into MLS success. At the team level, Reno recorded the best g+ differential the 2020 playoffs before they decided to cease operations following the season. You’ll want that nugget for trivia nights in 2030.
There are more trivial nuggets to be found. Maybe you’re looking players who used 2016 and/or 2017 NASL seasons as a stepping stone for MLS success? Ibson and Christian Ramirez played together during Minnesota’s final season as an NASL side, and they produced very similar, very respectable g+ metrics in MLS, in line with players like Jonathan dos Santos and Federico Higuain. Bonus points to anyone who finds all such NASL players who made it to MLS.
Anyway, it’s all in the app. Go find it!
For those wondering how just a few seasons of smaller leagues could produce credible model outputs for expected goals, expected passes, and expected possession value, I love where your head is at. We opted to combine all five of our leagues together into one large training set for model tuning. We incorporated a league indicator in the parameterization for the xgboost algorithm, effectively allowing the model to identify distinct features of each league where sufficient data warranted such differentiation, while regressing areas of thin data back toward the average across all leagues.
We tested this approach across a number of men’s leagues explicitly on an expected goals model, which suffers from smaller sample sizes because shots don’t happen that often. It turns out that even knowing the league just doesn’t matter that much—the rest of the information about the shot is a good indicator of its likelihood of scoring, regardless of the league. Others here at ASA are looking into differences between MLS and NWSL models and whether the two distinct styles of play lead to different expected values for various actions. But for now, we’re confident that our approach of combining the training data across leagues leads to expected metrics as accurate as possible, given the small sample size we have to work with in leagues like USL League One and NWSL.