The Elusive Advanced Defensive Metric

By Mark Asher Goodman (@rappidsrabbi)

It all started with Micheal Azira.

At the conclusion of the 2017 MLS season, I sifted through the wreckage of the Colorado Rapids awful season, player by player, to see what could be learned. Who, among these players was actually a high-quality soccer player? Who should the team retain for next year? Who should be jettisoned? Why? How can I know the difference? And, most importantly for readers of a data-obsessive website like American Soccer Analysis, can I find a credible way of answering those questions using advanced metrics?

For goal scorers, the answer to that question is, more or less, yes. You have expected goals (xG), and while flawed, it can tell you with a certain degree of reliability whether one striker can find the back of the net better than another. For passers, expected assists (xA) can be instructive, too. It can broadly inform a person as to whether a player gets a lot of passes into good spots that create goals. Both of these measures, xG and xA, are certainly reductive: they boil players down to just a few of their assets rather than measuring the totality of their contribution. Both of these measures are unable to isolate out a players individual contribution as completely independent of the actions of the whole team.

But the results you get from a table of xG roughly conforms to observations you make with your eyes. Josef Martinez is off the charts in his G-xG performance for 2017; he had an expected goals of 12.35 but scored a total 19 goals on the year, outperforming expectation by 6.65 goals. We still don't know if he's so good he breaks the model or just a bit lucky, but if you watched Josef Martinez in a game in 2017, you saw him score goals from blistering runs, incredible angles, and fantastic long blasts. Your eyes confirmed what the numbers tell you: Josef Martinez is a killer striker.

Surely, I thought, there is a way to do the same thing with defensive midfielders. Of course, I had to focus on the defensive midfielders. Because I have a serious thing for d-mids. If there was a ‘Unified Field Theory’ of soccer; if there was ‘one ring to rule them all’ for soccer enthusiast, writer, coach, and dad Mark Goodman, it would be ‘your defensive midfielder is the key to everything.’ So, back in November, I set about trying to assess the Rapids journeyman defensive midfielder and Ugandan international Micheal Azira by coming up with a unified metric that would assess all the d-mids in MLS. This attempt to create the ‘one big stat’ is not unlike how baseball created the metric VORP (Value Over Replacement Player) or how basketball uses OBPM and DBPM, (Offensive/Defensive Box Plus/Minus), or how throwball uses the QB rating.

I certainly put in a serious and thoughtful effort, and overall I was fairly pleased with how it came out. But even I can admit that the results were highly suspect. I’m really not a math guy - I took some college stats courses, and I’m a reasonably smart dude, but there isn't a lot of math in rabbinical school (thankfully). I’m a soccer guy, and so I used my philosophy of trying to find examples of what a soccer player playing as a defensive mid ought to be doing, threw numbers at it, ranked all the qualifying players, and came out with a chart. It’s not even a cool metric like how QB ratings are clustered around 100 or VORP establishes the baseline as 0.0 valued ‘replacement level player’, simply because I couldn't even begin to wrap my head around how the hell I would go about that. So it’s just a simple ranking.

Before I show you the results, here was my thinking in selecting the statistics I used to find the ideal defensive midfielder. A defensive midfielder needs to clean up and shield the defense. They do that by either knocking the ball away from a dribbler (a ‘tackle’), kicking or heading out an errant pass (a ‘clearance’), stymieing a shot (a ‘block’), or stepping into a passing channel and coming away with the ball, (an ‘interception’). These four stats, taken together, form the core of what a d-mid does when he or she acts as a defender. On offense, I only want my ideal d-mid to do one thing - send in dangerous passes that might create a goal. Long passes, short passes, crosses, through-balls - it makes no difference to me. Send in a ball; break the lines; unlock the defense. The best statistic for this is ‘Key Passes’. I took those defensive numbers, marked ‘em CBI+T, added them up, and divided by the number of games a player played. I looked up those numbers for very defensive midfielder in MLS, along with a few central midfielders that I thought could be adequately described mathematically as defensive midfielders. And I ranked them all. Then I took all the players Key Passes per game. And I ranked all those dudes by key passes. Then I averaged those two sets of ranks. And behold, the perfect, not-in-any-way flawed or biased, entirely definitive, list of MLS defensive midfielders, from best to worst.

NameTeamGPMinCBICBICBI pgTT pgCBI+T pg (rank)KPKP pg (rank)AAve of CBIT&KP rankOverall Rank
Cristian RoldanSEA33296036854982.971143.456.43 (7)431.30 (6)36.51
Kelyn RoweNE23189735334723.13642.785.91 (12)391.70 (3)77.52
Ale BedoyaPHI282509369581033.68853.046.72 (5)270.96 (14)49.53
IbsonMIN31266043541892.871103.556.42 (8)321.03 (12)6104
David GuzmanPOR25202827146742.96662.645.60 (18)431.72 (2)6104
Kyle BeckermanRSL26224236453933.58582.235.81 (15)271.04 (10)012.56
Jermaine JonesLAG20158830431653.25442.25.45 (20)241.20 (7)413.57
Alexander RingNYC292604399711194.11143.938.03 (2)190.66 (26)4148
Diego CharaPOR29253018944712.451003.455.90 (13)270.93 (15)3148
Darwin CerenSJ21164214435532.52743.526.04 (10)170.81 (19)214.510
Marky DelgadoTOR262072231139732.81632.425.23 (23)271.04 (10)516.511
Michael BradleyTOR30270032656943.13652.175.30 (22)290.97 (13)217.512
Juan David CabezasHOU2722314612551134.191023.787.97 (3)130.48 (33)01813
Bastian SchweinsteigerCHI24199332535723391.634.63 (33)331.38 (5)61914
Marcelo SarvasDC25184127755893.56542.165.72 (16)180.72 (22)11914
Tyler AdamsNYRB24199425441702.92642.675.59 (19)190.79 (20)419.516
Dax McCartyCHI28246532348832.96632.255.21 (24)240.88 (16)52017
Haris MedunjaninPHI34305736742852.5471.383.88 (39)882.59 (1)122017
Will JohnsonORL26205747041883.38471.815.19 (25)230.88 (16)320.519
Micheal AziraCOL3025694712761354.5812.78.20 (1)90.30 (41)22120
FelipeNYRB33296130650862.6792.394.99 (26)290.88 (16)52120
Roger EspinozaSKC30267731438722.4652.174.57 (34)321.07 (9)321.522
Matias LabaVAN19161221541673.53824.327.85 (4)80.42 (39)121.522
Luke MulhollandRSL27194929543772.85682.525.37 (21)190.70 (23)42224
Mohammed SaeidCOL29226913227421.45381.312.76 (43)481.66 (4)423.525
Wil TrappCLB3430354517521143.35772.265.61 (17)200.59 (30)523.525
Ozzie AlonsoSEA262073372038953.65592.275.92 (11)120.46 (37)32427
Sam CroninMIN181551251143794.39351.946.33 (9)60.33 (40)124.528
Jeff LarentowiczATL3325986119581384.18762.36.48 (6)50.15 (44)12529
Kellyn AcostaFCD2318958525381.65421.823.47 (42)271.17 (8)22529
Anibal GodoySJ26213418554772.965224.96 (27)180.69 (24)125.531
Ilie SanchezSKC332970348631053.18491.484.66 (32)260.79 (20)22632
Jared JeffreyDC23161720743703.04652.825.82 (14)70.30 (41)127.533
Carlos GruezoFCD31261313549672.16812.614.77 (30)210.68 (25)427.533
Ricardo ClarkHOU28225233738782.79602.144.93 (28)180.64 (27)227.533
Marco DonadelMTL19142422332552.89321.684.57 (34)120.63 (28)13136
Hernan BernadelloMTL22149615125411.86522.364.22 (37)140.63 (28)332.537
ArturCLB24170214525441.83733.044.87 (29)110.46 (37)33338
Joao PedroLAG28230627645782.79481.714.50 (36)150.53 (32)23439
Carlos CarmonaATL312684271238772.486224.48 (37)150.48 (33)23540
Scott CaldwellNE33219525538682.06702.124.18 (38)160.48 (33)535.541
Cristian HiguitaORL26155416126431.65572.193.84 (40)150.58 (31)335.541
Xavier KouassiNE23121828545783.39301.34.69 (31)60.26 (43)13743
Tony TchaniVAN27212119635602.22391.443.66 (41)130.48 (33)13743
Nana BoatengCOL1880610513281.561812.56 (44)50.28 (42)14345
C: clearances, B: blocks, I: interceptions, CBI: sum of clearances, blocks and interceptions, pg: per game, T: tackles, KP: key passes, A: assists. Click a column to sort by it.

And THAT’s how I invented the greatest statistical tool in soccer history. Just kidding. Obviously, there are things a person could take issue with in my evaluation. Lots of them.

For starters, I left out several important statistics that a lot of people value in defensive mids. Namely, ‘recoveries’, ‘duels won’, and ‘fouls’. Recoveries, to me at least, seem sort of random, or at least highly dependent on the other players around you. If you have a lot of hard tackling, ball winning players around you, you’ll get recoveries, but that seems little-dependent on your own abilities. I feel similarly about duels - there just seems to be a lot of chance involved in a loose ball bouncing into the right place for you to claim a split-second before the other guy. Fouls is also hard to factor in - is a hard-fouling d-mid necessarily a bad thing? Is it a significant statistic that should downgrade a player in evaluation? More importantly, HOW would I adjust for fouls?

Another much larger question is whether defensive counting statistics like CBI and tackles is a good way to measure defensive effectiveness at all. These counting stats are likely to be skewed in favor of teams that prefer tactics that produce a lot of defensive actions; tactics like the bunker-and-counter, or playing direct, or any game plan that favors letting the other team have the bulk of possession. Defensively minded teams are likely to hand over the ball to their opponent and say ‘come and get us, if you dare.’ That’ll probably gin up the defensive metrics of all of their players. Similarly, the d-mids on an attack-minded or possession-minded team will record fewer tackles and clearances, and hence, they will look less impressive in comparison.

There is also the problem of how different formations produce differing results. A player in the central midfield of a 4-4-2 has different responsibilities than a d-mid in a 4-3-2-1 or the lone shielding d-mid in a 4-1-4-1. Treating each player’s stats as equal without regard to formations isn't going to yield results that are true. But I’m not sure there’s any alternative that I can come up with for how to adjust for different formations.

Another whole kettle of fish to get into is the lack of offensive metrics in my defensive midfielder ranking. Some d-mids take shots and score; perhaps xG or goals should be factored into my ‘grand unified metric’. Other d-mids are effective in the dribble, and so perhaps I should have included ‘take-ons’ in my metric. I didn't include either. That’s probably because I believe those skills are essential for attacking midfielders, but for a defensive mid, they aren't core to what makes a successful player at that position.

So I posted my article and threw the chart up on twitter, and then invited people to throw down. And throw down they did.

Sounders fans were thrilled, and SounderAtHeart.com retweeted my post. I’m sure that was because they truly understood the nuances of my systematic approach, and not at all because Cristian Roldan came out as the best d-mid in MLS. On the other hand, Ben Baer of MLSsoccer.com was unimpressed, to say the least. He specifically mentioned the need to include usage rates and passing accuracy, which I find interesting suggestions that also have their own problems. To Ben’s mind, Michael Bradley is the best d-mid in MLS, and any all-encompassing metric that doesn't reflect that is bad and wrong. Matt Doyle, also of MLS, was similarly unmoved, although he wasn't specific about what he would do differently. He did make some jokes intimating the superiority of the Audi Player Index, though. At least, I think they were jokes. Ryan Catanese, now working with Atlanta United, thought my d-mid tool must be screwy because Jermaine Jones came out so high. To which, he has a point.

Nonetheless, you only get on twitter and say something so that other people can get on twitter and tell you how wrong you are. I angried up the blood of a bunch of MLS pundits on the twitter, and thus I can proudly say: mission accomplished.

Regardless of whether the system I created the ‘right’ final product, it was instructive. The attempt to create an all-encompassing player metric gave me a chance to pull apart the things that MLS pundits like the ones above say about players, to see if they stood up to scrutiny. Haris Medunjanin, the well respected and offense driven midfielder for Philadelphia Union, was the leading central midfielder in terms of Key Passes, but was only 39th in CBI+T, showing that there are often some serious trade-offs when a team chooses to emphasize the offensive side of their deep-lying midfielders. Minnesota’s Ibson and Houston’s Juan David Cabezas, both thought of as fairly average work-a-day MLS players, are actually upper-echelon d-mids. My metric would say that the two of them are undervalued. Some teams that struggled in MLS in 2017, like Orlando City SC, LA Galaxy, and New England Revolution, had defensive midfielders (Cristian Higuita, Joao Pedro, Xavier Kouassi, and Scott Caldwell) that were similarly poor on both the defensive and offensive sides of my rankings. Those findings match up with what observers have noted about those teams: they were soft in the middle, among a host of other problems that each club had.

But ultimately, even if somebody had given my advanced defensive metric a fancy name (DVORP! DTAM! Dx90zeitungshplatz!) or had jiggered it to be adjusted to a base value of 0.0 or 100 or something, I did not succeed in creating the perfect number that had the ability to quantify and identify the perfect defensive midfielder. People didn't agree with the numbers I used, because they think the ideal d-mid ought to do more of X and less of Y. They didn't respect the results, because the results didn't correspond to their own observations. Many people probably might think that the attempt to create a singular number that defines ‘the best’ at a position is an absurd notion.

Which brings me all the way back to Micheal Azira. I set out on this project for a simple reason: to answer the question of what Micheal Azira’s assets and liabilities are when you make him a key part of your starting XI each week. To the discerning eye of Rapids fans, Micheal is a ball-winning, lane-clogging pest. His defense is exceptional, and on many occasions, he saved the Rapids bacon time and time again in matches. And the numbers agree: on CBI+T, Azira was the best midfielder in MLS in 2017; better than NYCFC’s highly touted Alexander Ring, better than USMNT mid Alejandro Bedoya, and better than Vancouver’s stalwart shield Matias Laba. Rapids fans could also tell you that what Azira gives you in defense, he sincerely lacks as a passer and attacker.  Out of the 44 midfielders I ranked, only two players had fewer key passes than Azira: Kouassi and Jeff Larentowicz.

When you aggregate his offensive and defensive ranks, you come out with the conclusion that Azira was the 20th best defensive midfielder in MLS last year. This might tell you that the Rapids needed an offensive upgrade going forward. This might tell you that the metric I created puts too much weight on defensive prowess. Or it might tell you that Micheal Azira’s awesome defensive skills might make him a better choice as a right back or center back.

I hope somebody makes a better defensive metric than mine. Perhaps some stat-headed MLS office already has, but they’re keeping their magic formula secret, lest some other MLS team learns what they prize the most in players and jacks up their transfer fees accordingly. There are smarter and more creative and more analytical minds than mine watching soccer and devising better ways to measure what cannot currently be measured.

I believe there will be a grand unified metric of defense someday; and simultaneously, another cadre of soccer fans will decry that number are heresy, and will say that this other number is better and smarter. Maybe it’ll take a great mathematician or a team of supercomputers to figure out the secret sauce that defines the platonic ideal of a defensive midfielder. But regardless of whether that number is the irrefutable fact, or just creates a fairly firm data point for your next argument over a pint at the pub, I am steadfast in my belief that the truth is out there, even if it yet eludes the grasp of the soccermetricians of today.

Mark Asher Goodman writes for Around MLS and Pittsburgh Soccer Now. He primarily covers the Colorado Rapids and the USL’s Pittsburgh Riverhounds SC. When he’s not writing about soccer or watching soccer, he’s a rabbi. No, really, like he’s an actual rabbi. You can find him on twitter at @rapidsrabbi and @riverhoundrabbi.