Looking for the model-busting formula

Well that title is a little contradictory, no? If there's a formula to beat the model then it should be part of the model and thus no longer a model buster. But I digress. That article about RSL last week sparked some good conversation about figuring out what makes one team's shots potentially worth more than those of another team. RSL scored 56 goals (by their own bodies) last season, but were only expected to score 44, a 12-goal discrepancy. Before getting into where that came from, here's how our Expected Goals data values each shot:

  1. Shot Location: Where the shot was taken
  2. Body part: Headed or kicked
  3. Gamestate: xGD is calculated in total, and also specifically during even gamestates when teams are most likely playing more, shall we say, competitively.
  4. Pattern of Play: What the situation on the field was like. For instance, shots taken off corner kicks have a lower chance of going in, likely due to a packed 18-yard box. These things are considered, based on the Opta definitions for pattern of play.

But these exclude some potentially important information, as Steve Fenn and Jared Young pointed out. I would say, based on their comments, that the two primary hindrances to our model are:

  1. How to differentiate between the "sub-zones" of each zone. As Steve put it, was the shot from the far corner of Zone 2, more than 18 yards from goal? Or was it from right up next to zone 1, about 6.5 yards from goal?
  2. How clean a look the shooter got. A proportion of blocked shots could help to explain some of that, but we're still missing the time component and the goalkeeper's positioning. How much time did the shooter have to place his shot and how open was the net?

Unfortunately, I can't go get a better data set right now so hindrance number 1 will have to wait. But I can use the data set that I already have to explore some other trends that may help to identify potential sources of RSL's ability to finish. My focus here will be on their offense, using some of the ideas from the second point about getting a clean look at goal.

Since we have information about shot placement, let's look at that first. I broke down each shot on target by which sixth of the goal it targeted to assess RSL's accuracy and placement. Since the 2013 season, RSL is second in the league in getting its shots on goal (37.25%), and among those shots, RSL places the ball better than any other team. Below is a graphic of the league's placement rates versus those of RSL over that same time period. (The corner shots were consolidated for this analysis because it didn't matter to which corner the shot was placed.)

Placement Distribution - RSL vs. League

 

RSL obviously placed shots where the keeper was not likely at: the corners. That's a good strategy, I hear. If I include shot placement in the model, RSL's 12-goal difference in 2013 completely evaporates. This new model expected them to score 55.87 goals in 2013, almost exactly the 56 they scored.

Admittedly, it isn't earth-shattering news that teams score by shooting at the corners, but I still think it's important. In baseball, we sometimes assess hitters and pitchers by their batting average on balls in play (BABIP), a success rate during specific instances only when the ball is contacted. It's obvious that batters with higher BABIPs will also have higher overall batting averages, just like teams that shoot toward the corners will score more goals.

But just because it is obvious doesn't mean that this information is worthless. On the contrary, baseball's sabermetricians have figured out that BABIP takes a long time to stabilize, and that a player who is outperforming or underperforming his BABIP is likely to regress. Now that we know that RSL is beating the model due to its shot placement, this begs the question, do accuracy and placement stabilize at the team level?

To some degree, yes! First, there is a relationship between a team's shots on target totals from the first half of the season and the second half of the season. Between 2011 and 2013, the correlation coefficient for 56 team-seasons was 0.29. Not huge, but it does exist. Looking further, I calculated the differences between teams' expected goals in our current model and teams' expected goals in this new shot placement model. The correlation from first half to second half on that one was 0.54.

To summarize, getting shots on goal can be repeated to a small degree, but where those shots are placed in the goal can be repeated at the team level. There is some stabilization going on. This gives RSL fans hope that at least some of this model-busting is due to a skill that will stick around.

Of course, that still doesn't tell us why RSL is placing shots well as a team. Are their players more skilled? Or is it the system that creates a greater proportion of wide-open looks?

Seeking details that may indicate a better shot opportunity, I will start with assisted shots. A large proportion of assisted shots may indicate that a team will find open players in front of net more often, thus creating more time and space for shots. However, an assisted shot is no more likely to go in than an unassisted one, and RSL's 74.9-percent assist rate is only marginally better than the league's 73.1 percent, anyway. RSL actually scored about six fewer goals than expected on assisted shots, and six more goals than expected on unassisted shots. It becomes apparent that we're barking up the wrong tree here.*

Are some teams more capable of not getting their shots blocked? If so then then those teams would likely finish better than the league average. One little problem with this theory is that RSL gets it shots blocked more often than the league average. Plus, in 2013, blocked shot percentages from the first half of the season had a (statistically insignificant) negative correlation to blocked shots in the second half of the season, suggesting strongly that blocked shots are more influenced by randomness and the defense, rather than by the offense which is taking the shots.

Maybe some teams get easier looks by forcing rebounds and following them up efficiently. Indeed, in 2013 RSL led the league in "rebound goals scored" with nine, where a rebounded shot is one that occurs within five seconds of the previous shot. That beat their expected goals on those particular shots by 5.6 goals. However, earning rebounds does not appear to be much of a skill, and neither does finishing them. The correlation between first-half and second-half rebound chances was a meager--and statistically insignificant--0.13, while the added value of a "rebound variable" to the expected goals model was virtually unnoticeable. RSL could be the best team at tucking away rebounds, but that's not a repeatable league-wide skill. And much of that 5.6-goal advantage is explained by the fact that RSL places the ball well, regardless of whether or not the shot came off a rebound.

Jared did some research for us showing that teams that get an extremely high number of shots within a game are less likely to score on each shot. It probably has something to do with going for quantity rather than quality, and possibly playing from behind and having to fire away against a packed box. While that applies within a game, it does not seem to apply over the course of a season. Between 2011 and 2013, the correlation between a teams attempts per game and finishing rate per attempt was virtually zero.

If RSL spends a lot of time in the lead and very little time playing from behind--true for many winning teams--then its chances may come more often against stretched defenses. RSL spent the fourth most minutes in 2013 with the lead, and the fifth fewest minutes playing from behind. In 2013, there was a 0.47 correlation between teams' abilities to outperform Expected Goals and the ratio of time they spent in positive versus negative gamestates.

If RSL's boost in scoring comes mostly from those times when they are in the lead, that would be bad news since their Expected Goals data in even gamestates was not impressive then, and is not impressive now. But if the difference comes more from shot placement, then the team could retain some of its goal-scoring prowess. 8.3 goals of that 12-goal discrepancy I'm trying to explain in 2013 came during even gamestates, when perhaps their ability to place shots helped them to beat the expectations. But the other 4-ish additional goals likely came from spending increased time in positive gamestates. It is my guess that RSL won't be able to outperform their even gamestate expectation by nearly as much this season, but at this point, I wouldn't put it past them either.

We come to the unsatisfying conclusion that we still don't know exactly why RSL is beating the model. Maybe the players are more skilled, maybe the attack leaves defenses out of position, maybe it spent more time in positive gamestates than it "should have." And maybe RSL just gets a bunch of shots from the closest edge of each zone. Better data sets will hopefully sort this out someday.

*This doesn't necessarily suggest that assisted shots have no advantage. It could be that assisted shots are more commonly taken by less-skilled finishers, and that unassisted shots are taken by the most-skilled finishers. However, even if that is true, it wouldn't explain why RSL is finishing better than expected, which is the point of this article.

MLS PWP through 6 Weeks: Does the wheat begin to separate from the chaff?

You might not think that six weeks is enough to begin to categorize what teams are performing well and what teams aren't - I may even agree with you to an extent, but here's the thing: we're six weeks in, and patterns are beginning to take shape. Instead of just showing the combined Index for all 19 teams I'm going to split them up into the Eastern and Western Conferences to show a different view.  And here's my link to the Introduction to PWP.

Here's all the Eastern Conference teams up after 6 weeks (note some teams have yet to play six games):

Eastern Conference PWP Strategic Composite Index Cumulative to Week 6

Observations:

The intent here is to offer up a graphic that shows which teams are performing better in attack than their opponents so far. No intent here to write off anyone, yet... too early for that with 28 games and a maximum of 84 points still being available.

Let's just say that Berhalter and Vermes have their teams in top gear - while Hackworth, Olsen, Petke, Heaps and Nelson are still fine tuning... as for Klopas, Yallop and Kinnear performance needs to get better and I'm sure they already know that.

As a reminder - this Index is the difference between how well a team executes the six primary steps of Possession with Purpose versus how well their opponents execute those same steps against them. A negative number thus means that, on average, the opponent is performing those six steps better (collectively) than that team.

I'm not a betting man yet on this Index, but if you think the odds are good that Columbus wins the Eastern Conference, then a flutter of $20/20 BPS might be a worthy chance. Spreading your bet across the field with Sporting Kansas City and one or two other teams might be worthy as well... for now I'm not seeing Montreal make it; but that's just me.

On to the Western Conference:

Western Conference PWP Strategic Composite Index Cumulative to Week 6

Observations:

Like the Eastern Conference, it's too early to go too deep, and the high flying teams play each other three times this year just like those guys back east; when LA and FC Dallas square off it should be interesting...  all the while Colorado and Seattle continue to get better, with Vancouver and the ever present/haunting Real Salt Lake looking to make a strong mid-year run.

As for Portland - times are hard early on this year and a 15-game unbeaten streak would be a much needed does of medicine to put them into the thick of things. How San Jose and Chivas cope remains to be seen - and given the styles I've seen from them this year, it appears crosses are their primary way to penetrate.

If you're a betting guy, I'm even less sure about the West than the east at this point - for now spreading the bets where the odds are good seems a likely choice with LA probably being the front-runner... is this the year where big money shows value in the West, like New York garnered last year in the East?

As for the top performing PWP attacking teams in general; here's how they compare against each other across all of MLS:

PWP Strategic Composite Attacking Index Cumulative to Week 6

Observations:

While there is no sure thing if you're looking for teams who are more likely to put goals past their opponents in multiples it's likely the top 5-10 teams are those that can - whether they prevent the same number of goals is a different story.

Note Real Salt Lake is in the top 7 here but sits in 6th place overall in the Western Conference PWP - for me that indicates Real are operating pretty much like they did last year; score goals and work harder than your opponent to score more goals while relying on your defense to keep you in the game... without that stoppage time goal by Edu this past weekend it's likely RSL would have been higher up the Western Conference PWP Index.

Note also that Sporting remain in the top ten for Attack - they've always been viewed as a great defending side - the higher up the attacking scale they reach the more likely they will be balanced for another run at the Championship.

On the other end - New England and Toronto are bottom dwellers here but they are getting points; why so low?  In working their own style Toronto have started the season averaging just over 40% of the possession with just 64% accuracy in their overall passing - what we are seeing is timely penetration against opponents who are out of shape, position wise (for the most part) - recall also Defoe has been injured too.

As for New England - their accuracy and possession numbers are solid - where things drop off are their ability to create shots taken (2nd lowest in MLS so far this year) and their ability to convert those shots taken into shots on goal and goals scored.  Their goals scored percentage based on shots on goal is just 12.22%.  That is the lowest goal scoring conversion rate in MLS - and a whopping 56% points lower than FC Dallas - who have converted (on average) 68.33% of their shots on goal to goals scored...

Other notable pieces of information - both Columbus and LA are averaging better than 80% accuracy in 'all' passing totals; the teams doing the best in penetrating based upon total passes are New England (29.49%) with Houston, Columbus, Philadelphia and Chicago all hovering around 22%.  The team creating the most shots given their final third penetration is San Jose at 26%, Toronto at 25% and Chicago at 25% - can you say counter and direct attack (be it on the ground or in the air)?

The teams most successful in putting shots on goal compared to shots taken are Colorado (42%) Real Salt Lake (42%) Vancouver (41%) and FC Dallas at (40%)...

Moving on to the Composite PWP Defending Index...

PWP Strategic Composite Defending Index Cumulative to Week 6

Observations:

Not much separates the good from the not so good and perhaps the ugly; and it's too early to label anyone as really ugly.

For now the team most successful in holding their opponents to low passing accuracy percentages are Sporting KC (opponents just 70.25% accurate per game) with Real holding opponents to 71.97% accuracy, DC United 71.98% accuracy and Philadelphia holding opponents to 71.55% accuracy.

As for allowing penetration based upon overall passes; opponents of San Jose penetrate over 24% of the time while Vancouver also permits opponents to penetrate about 24% of the time.

In opponents completing final third passes the team most successful in limiting completed passes in their defending third is LA at 12% while Toronto's defense offers up a stingy 13.57%.

The teams allowing the most shots taken versus passes completed in their defending third are Chivas at 41.65% and New York at 40.55% - you wonder why I keep harping on New York that's why... they just don't defend that well in their own final third...

Teams yielding the most goals scored per shots on goal, per game, are Chivas at 45% (begging the question: why couldn't Portland score more than one goal?), Philadelphia at 43% while LA Galaxy allows a stingy 17% of their opponents shots on goal converted into goals scored.

In closing...

Just week 6, but patterns continue to develop - as the season unfolds I'll do my best to offer up these tidbits for your consideration.

For the future, I have a post coming up that speaks to formations and defensive activities - still need about 4 more weeks for that one to have enough data to offer some observations on it.

All the best, Chris

You can follow me on twitter at @chrisgluckpwp

 

USMNT - My thoughts after 2-2 Draw with Mexico

If you're like me you were pretty impressed with the first half Wednesday evening as Jurgen Klinsmann deployed a Diamond 4-4-2 in the truest sense - narrow and focused down the middle with the intent to manage the wings by channeling things to the middle. It worked really well in the first half. To give you a comparison on how well it went, here's a table on their Possession with Purpose (six steps in Attack) in the first half compared to that of the second half with the average for MLS Teams in 2013.

But before offering the here's a link to what PWP is all about in case you've missed it before.

Team Possession Percentage Passing Accuracy Percentage Penetration Percentage Creation of Shots Taken based upon Penetration Percentage Shots on Goal compared to Shots Taken Percentage Goals Scored compared to Shots on Goal Percentage
USMNT 1st Half          59%           85%           13%            14%           80%           50%
USMNT 2nd Half          41%           80%           18%            25%           14%            0%
Mexico 1st Half          41%          75%           21%             5%            0%            0%
Mexico 2nd Half          59%          80%           23%           35%          44%          25%
MLS 2013 Average for Comparison          50%          76%           22%           20%          34%          30%

Observations:

I won't offer up anything new here that I didn't already offer on twitter during the match but in case you missed some of those streaming thoughts here they are without limiting my words to the format of twitter.

Bradley and Beckerman needed to be the fulcrum between the defending side of the pitch and the attacking side of the pitch if that Diamond 4-4-2 is to be successful - I'd offer that most would agree they were (at least in the first half).

In considering I had never seen Michael Parkhurst in a left fullback position I opined that the way this team lined up some good chances would come down the right side with Beltran running overlaps or supporting Zusi in deep penetration on the wings.

I'd offer that was also the case in the first half - nothing better as an example than the goal Wondolowski had working from the Zusi cross that Bradley flicked on for Wondolowski to poke home.

What was surprising to me (a very welcome surprise) was how effective Michael Parkhurst was in the first half working the left side with his own mix of penetration combined with Davis --- I really did enjoy seeing that Wednesday evening and support like that from Michael reinforces his ball handlling skills - and - In my view makes him a very credible selection to start at Centerback along with Matt Besler.

If you didn't already know Michael Parkhurst was my PWP Defender of the Week #1 and here's that article supporting that analysis.

I'm not sure why I've never rated Omar Gonzalez highly but I don't - maybe it's his defensive positioning that makes me nervous but I'm a defensive minded guy in football and while there are good points in having a CB who can attack the box on set-pieces my view is that they are first and foremost on the pitch to (STOP) the opponent from scoring - all else is a bonus after that.

As for the goals against in the second half - other pundits have already offered up the Capt. Obvious here that Gonzalez was directly accountable for both goals scored by Mexico - so I ask (rhetorically) did he really add value to this squad in that game in his primary role and if not - who's better?

That's not a question for me to answer but I think it is a question Jurgen Klinsmann needs to ask himself and his new staff...

Like many things in life, I'm not particularly fond of folks who offer up a problem (be it real or perceived) without also coming up with a solution/recommendation to that problem.  So with that here are my options knowing that some players are simply not going to get selected that haven't already played under Jurgen recently.

Goodson - Not sure here either - I personally have not seen him enough to offer a view that (in my view) has merit - he does well for San Jose but he didn't get particularly good minutes overseas with what I feel and think is a top rated club.  More information needed.

Parkhurst - I have seen him probably as little as I have seen Goodsen but in those few short games (and his impressive showing to me on Wednesday evening) it is clear he has the pace to cope with the wings and also has the passing accuracy and understanding of a broader role in positional play to make an very effective CB (starting CB) provided he can handle the more physical side of the game when teams include a more traditional #9 who plays more with his back to goal than trying to run on to through balls.

Cameron - His time overseas is seeing the game as a right back for Stoke - is that mix the right mix to settle in alongside Besler - and how is that chemistry going to take shape?  He has positional awareness of how positional play works down the wings so that adds great value - just like that in seeing Parkhurst play the left side Wednesday evening.

For me Parkhurst is a first option to pair with Besler but my view is limited - call it a gut instinct - but do folks really expect a CB who has played as long as Gonzalez to say in passing he needs to be more dominant in his role as a CB in protecting the box?  Wow - I hope not.  That is something a CB should KNOW and understand from day 1...  oh my...

Perhaps a more compelling question is how long has this weakness (lack of being switched-on to the true purpose of a CB) been or not been recognized by the USMNT staff?

And then to throw a teammate under the bus - bollocks - it just reinforces my own views that Gonzalez is not the right choice to represent the USMNT as a starting CB in the World Cup.

A winning World Cup team must be linked in and switched-on to roles and responsibilities for 90+ minutes for at least 3 games in 8 days??? in order to advance - and then it just gets tougher and tougher... that speaks to having resilience in a squad and throwing a teammate under the bus is not an example of resilience - it represents a shirking of responsibility.

As for Green - as noted in my finishing tweets for the match - in my view Green is still green - that was worthy and notable of Klinsmann to put him in as a way to begin his trail of caps - but as an option going forward now?  Unless his attitude is so positive and infectious for others I just don't see him having any role of substance this World Cup - the hype is what it is - hype...

Tough question here for those who've been around footy for some time.  If a real stud, do you really think Green would miss an opportunity to play for Germany in a World Cup or European Championship in the very near term?

The pedigree with the German side is simply to strong to even think for a second that (if a starter there) he would ditch that opportunity to be a starter here).  Like it or not the USMNT's progress has not made it that far in being that good...  if you think they have your emotions are overwhelming your senses.  Bringing Green into the side is more about 2018 than 2014; and for that I tip my hat to Klinsmann...

In closing...

A welcome sight to see the USMNT open in a diamond 4-4-2 and the pieces to that puzzle looked pretty good when considering who started and who didn't.  I would offer that style of play speaks to some of the stronger styles we see in MLS - is that the intent of Klinsmann - to stamp a particular style of play that suits the stronger and more possession oriented sides in the MLS who are also known for closing down and giving the opponent very little space and time to work with?

I think so - and yes - width is critical to manage when working a narrow approach and the right pieces need to be there to do that.  Evidence of that was very clear Wednesday evening - as the second half opened and the subs began to rotate for the USMNT the Mexican side went from offering up just 8 crosses in the first half to a total of 23 crosses for the second half.

Clearly the change in players on both sides, to include complacency and fatigue of the USMNT, directly impacted and influenced the team attacking style of Mexico.

In looking towards the final selection Klinsmann has some issues to wrestle with - how does he balance the chemistry of the "MLS Players" playing together versus those guys who play abroad - how does Altidore fit into a Diamond 4-4-2? 

He's been laboring with Sunderland this year and I'm not familiar enough with that team to know what system they operate - but given their position in the League Table it would appear they are not very good in scoring goals - which for me tends to indicate their midfield isn't that strong.  To paraphrase Harrison on this one - Jozy doesn't have the right complimentary pieces to go with his skill set...

And with a trend of American players returning stateside might we see Jozy make a transfer move similar to Bradley and Dempsey this summer?  Hard to say now but if the USMNT chemistry continues to mature, using a majority of players from MLS, it just might mean the most effective move for him is a return to America.

Before signing off I have one final postulate for consideration.  In seeing how the game went Wednesday evening - has anyone considered that - given it was a friendly - the intent of the second half might have also included studying how Mexico may adjust, in pitch activity, to the Diamond 4-4-2, in order for Klinsmann to gather data on how other opponents might adjust (real time) in the World Cup?

This also provides Klinsmann some real data to evaluate on ways he might counter the opponents counter...  I wouldn't put it past him - especially since the game wasn't a real win or lose game of consequence...

That's all for me for now... More to follow on twitter as I join the crowd at Providence Park for the Cascadia clash between Seattle and Portland.

 

 

Introducing Expected Goals 2.0 and its Byproducts

Many of the features listed below from our shot-by-shot data for 2013 and 2014 can be found above by hovering over the "Expected Goals 2.0" link. Last month, I wrote an article explaining our method for calculating Expected Goals 1.0, based only on the six shot locations. Now, we have updated our methods with the cool, new, sleek Expected Goals 2.0.

Recall that in calculating expected goals, the point is to use shot data to effectively suggest how many goals a team or player "should have scored." This gives us an idea of how typical teams and players finish, given certain types of opportunities, and then allows us to predict how they might do in the future. Using shot locations, if teams are getting a lot of shots from, say, zone 2 (the area around the penalty spot), then they should be scoring a lot of goals.

Expected Goals 2.0 for Teams

Now, in the 2.0 version, it's not only about shot location. It's also about whether or not shots are being taken with the head or the foot, and whether or not they come from corner kicks. Data from the 2013 season suggest that not only are header and corner kick shot totals predictive of themselves (stable metrics), but they also lead to lower finishing rates. Thus, teams that fare exceptionally well or poorly in these categories will now see changes in their Expected Goals metrics.

Example: In 2013, Portland took a low percentage of its total shots as headers (15.4%), as well as a low percentage of its total shots from corner kicks (12.3%). Conversely, it allowed higher percentages of those types of shots to its opponents (19.2% and 15.0%, respectively). Presumably, the Timbers' style of play encourages this behavior, and this is why the 2.0 version of Expected Goal Differential (xGD) liked the Timbers more so than the 1.0 version

We also calculate Expected Goals 2.0 contextually--specifically during times periods of an even score (even gamestate)--for your loin-tickling pleasure.

Expected Goals 2.0 for Players

Another addition from the new data we have is that we can assess players' finishing ability while controlling for the various types of shots. Players' goal totals can be compared to their Expected Goals totals in an attempt to quantify their finishing ability. Finishing is still a controversial topic, but it's this type of data that will help us to separate out good and bad finishers, if those distinctions even exist. Even if finishing is not a repeatable skill, players with consistently high Expected Goals totals may be seen as players that get themselves into dangerous positions on the pitch--perhaps a skill in its own right.

The other primary player influencing any shot is the main guy trying to stop it, the goalkeeper. This data will someday soon be used to assess goalkeepers' saving abilities, based on the types of shot taken (location, run of play, body part), how well the shot was placed in the goal mouth, and whether the keeper gave up a dangerous rebound. Thus for keepers we will have goals allowed versus expected goals allowed.

Win Expectancy

Win Expectancy is something that exists for both Major League Baseball and the National Football League, and we are now introducing it here for Major League Soccer. When the away team takes the lead in the first 15 minutes, what does that mean for their chances of winning? These are the questions that can be answered by looking at past games in which a similar scenario unfolded. We will keep Win Expectancy charts updated based on 2013 and 2014 data.

How it Happened: Week Three

In the three games I watched this week, five goals were scored. Two were from penalty kicks, and two were off corner kicks. Needless to say, offenses around the league are in early-season form, i.e. not exactly clicking in front of the net. On the bright side, there was a decent amount of combination play leading to chances....it's just that whole putting them away thing that MLS teams are still working on. Onto the main attraction: Chicago Fire 1 - 1 New York Red Bulls

Stat that told the story for New York: 350 completed passes; 68% of which were on the left side of the field*

nyrb3

It's hardly inspiring for the Supporters' Shield holders to sneak away from Chicago with a draw, but I actually thought they played pretty well on Sunday. Like I said above about the league as a whole, quality was missing on the final ball/shot, but New York fans shouldn't be too worried about the team's winless start. In this one there was quite a bit of good linking-up, particularly on the left flank. Given that midfielder Matt Watson was starting in a pinch as a nominal right back for the Fire, it seemed like a concerted effort from RBNY to expose a weakness on that side of the field. Between Roy Miller, Jonny Steele and Thierry Henry, there were some encouraging sequences down that side in particular; unfortunately for New York it didn't lead to any actual goals.

*This stat/image is blatantly stolen from the Twitter account of MLS Fantasy Insider Ben Jata, @Ben_Jata. After seeing it this weekend, I was unable to think of anything better to include, so thanks, Ben!

Stat that told the story for Chicago: 24 total shots + key passes, only 2 of which were from Mike Magee

I'm not sure if this one is a good stat for Chicago fans or a bad one, but Mike Magee was conspicuously absent from a lot of the action this weekend (unless you count yelling incessantly and childishly at the ref as your definition of 'action'). But seriously: last year Chicago had 377 shots the entire season, and Magee either took or assisted on 116 of them (31%)*. Oh, and he only played 22 of their 34 games. The fact that he was involved in only 2 of the team's 24 shots (both of his shots were blocked, for what it's worth) could certainly be viewed as concerning for Chicago fans expecting another MVP-caliber season out of Magee. But on the other hand, it's easy to chalk up the struggles to the fact that this was his first game of the season after a maybe-contract-hold-out related hiatus. Also, the fact that Chicago managed to create 22 shots without Magee's direct influence (or Patrick Nyarko and Dilly Duka, both also out this weekend) has to be a good sign for a team that was often a one-man show last season: youngsters Harrison Shipp and Benji Joya in particular both seem capable of lightening the load.

*Numbers from Squawka.

 

Toronto FC 1 - 0 DC United

Stat that told the story for Toronto: 38% possession, 3 points won

tfc3

TFC captain Michael Bradley made headlines this week saying something along the lines of how possession was an overrated stat, and his team certainly appears to be trying to prove his point so far this season. The Reds didn't see a ton of the ball in their home opener, instead preferring to let DC knock the ball around with minimal penetration in the final third. And then when Toronto did win the ball, well, check out the Opta image that led to the game's lone goal for Jermain Defoe (or watch the video). It started with a hopeful ball from keeper Julio Cesar. The second ball was recovered by Steven Caldwell, who fed Jonathan Osorio. Osorio found his midfield partner Bradley, who lofted a brilliant 7-iron to fellow DP Gilberto. The Brazilian's shot was saved but stabbed home by the sequence's final Designated Player, Defoe. Balls like that one were played multiple times throughout the game by both Bradley and Osorio, as TFC has shown no aversion to going vertical quickly upon winning the ball. And with passes like that, speedy wingers, and quality strikers, it's certainly a strategy that may continue to pay off.

Stat that told the story for DC: 1/21 completed crosses

This stat goes along a bit with what I wrote about Toronto above: they made themselves hard to penetrate in the final third, leading to plenty of incomplete crosses. Some of this high number of aimless crosses also comes from the fact that DC was chasing an equalizer and just lumping balls into the box late in the match. Still, less than 5% on completing crosses is a bit of a red flag when you look at the stat sheet. Particularly when your biggest attacking threat is Eddie Johnson, who tends to be at his best when attacking balls in the air. You'd think Ben Olsen would expect a better crossing percentage. To be fair to United though, I thought they were much better in this game than they were on opening day against Columbus. They looked about 4 times more organized than two weeks ago, and about 786 times more organized than last season, and their possession and link-up play showed signs of improvement too. Still a ways to go, but at least things are trending upward for the Black and Red.

 

Colorado Rapids 2 - 0 Portland Timbers

Stat that told the story for Portland: 1 Donovan Ricketts karate kick

por3

I admit that I'm cheating here and not using a stat or an Opta Chalkboard image. But the above grainy screenshot of my TV that I took is too hilarious and impactful not to include. Colorado and Portland played a game on Saturday that some might call turgid, or testy, or any number of adjectives that are really stand-ins for the word boring. The most interesting parts of most of the game were Ricketts' adventures in goal, which ranged from dropping floated long balls to tipping shots straight in the air to himself. In the 71st minute it appeared Ricketts had had enough and essentially dropped the mic. Flying out of his net, he leapt into the air with both feet, apparently hoping that if he looked crazy enough the ref would look away in horror instead of red carding him for the obvious kick to Deshorn Brown's chest. The Rapids converted the penalty and then added another one a few minutes later, and that was all she wrote.

Stat that told the story for Colorado: 59 total interceptions/recoveries/tackles won; 27 in the game's first 30 minutes

Alright, I was silly with the Portland section so I feel like I need to do a little serious analysis for this paragraph. The truth is that this game was fairly sloppy on both sides, which is particularly surprising considering how technically proficient Portland was for most of last season. But cold weather combined with early season chemistry issues makes teams play sloppily sometimes, and it didn't help that Colorado came out and looked very good to start this game. Their defensive shape was very compact when the Timbers had the ball, and the Rapids were very proficient in closing down passing lanes and taking possession back. The momentum swung back to Portland's side and back a couple of times throughout the match, but Colorado's strong start set the tone that Donovan Ricketts helped carry to the final whistle.

 

Agree with my assessments? Think I'm an idiot? I always enjoy feedback. Contact me on twitter @MLSAtheist or by email at MLSAtheist@gmail.com

MLS Week 3: Expected Goals and Attacking Passes

In the coming days, Matthias will be releasing our Expected Goals 2.0 statistics for 2014. You can find the 2013 version already uploaded here. I would imagine that basically everything I've been tweeting out from our @AnalysisEvolved twitter handle about expected goals up to this point will be certainly less cool, but he informs me it won't be entirely obsolete. He'll explain when he presents it, but the concept behind the new metrics are familiar, and there is a reason why I use xGF to describe how teams performed in their attempt to win a game. It's important to understand that there is a difference between actual results and expected goals, as one yields the game points and the other indicates possible future performances. However, this post isn't about expected goal differential anyway--it's about expected goals for. Offense. This obviously omits what the team did defensively (and that's why xGD is so ideal in quantifying a team performance), but I'm not all about the team right now. These posts are about clubs' ability to create goals through the quality of their shots. It's a different method of measurement than that of PWP, and really it's a measuring something completely different.

Take for instance the game which featured Columbus beating Philadelphia on a couple of goals from Bernardo Anor, who aside from those goals turned in a great game overall and was named Chris Gluck's attacking player of the week. That said, know that the goals that Anor scored are not goals that can be consistently counted upon in the future. That's not to diminish the quality or the fact that they happened. It took talent to make both happen. They're events---a wide open header off a corner and a screamer from over 25 yards out---that I wouldn't expect him to replicate week in and week out.

Obviously Columbus got some shots and in good locations which they capitalized on, but looking at the xGF metric tells us that while they scored two goals and won the match, the average shot taker would have produced just a little more than one expected goal. Their opponents took a cumulative eleven shots inside the 18 yard box, which we consider to be a dangerous location. Those shots, plus the six from long range, add up to nearly two goals worth of xGF. What this can tell us is two pretty basic things 1) Columbus scored a lucky goal somewhere (maybe the 25 yard screamer?) and then 2) They allowed a lot of shots in inopportune locations and were probably lucky to come out with the full 3 points.

Again, if you are a Columbus Crew fan and you think I'm criticizing your team's play, I'm not doing that. I'm merely looking at how many shots they produced versus how many goals they scored and telling you what would probably happen the majority of the time with those specific rates.

 

 Team shot1 shot2 shot3 shot4 shot5 shot6 Shot-total xGF
Chicago 1 3 3 3 3 0 13 1.283
Chivas 0 3 2 2 3 0 10 0.848
Colorado 1 4 4 2 1 1 13 1.467
Columbus 0 5 1 2 1 0 9 1.085
DC 0 0 1 1 4 0 6 0.216
FC Dallas 0 6 2 0 1 1 10 1.368
LAG 0 0 4 2 3 0 9 0.459
Montreal 2 4 5 8 7 0 26 2.27
New England 1 2 1 8 5 0 17 1.275
New York 2 4 2 0 2 0 10 1.518
Philadelphia 2 5 6 2 4 0 19 2.131
Portland 0 0 2 2 2 1 7 0.329
RSL 0 4 3 0 3 0 10 0.99
San Jose 0 2 0 0 3 0 5 0.423
Seattle 1 4 0 2 2 0 9 1.171
Sporting 2 6 2 2 3 2 17 2.071
Toronto 0 6 4 2 2 0 14 1.498
Vancouver 0 1 1 3 3 0 8 0.476
 Team shot1 shot2 shot3 shot4 shot5 shot6 Shot-total xGF

Now we've talked about this before, and one thing that xGF, or xGD for that matter, doesn't take into account is Game States---when the shot was taken and what the score was. This is something that we want to adjust for in future versions, as that sort of thing has a huge impact on the team strategy and the value of each shot taken and allowed. Looking around at other instances of games like that of Columbus, Seattle scored an early goal in their match against Montreal, and as mentioned, it changed their tactics. Yet despite that, and the fact that the Sounders only had 52 total touches in the attacking third, they were still able to average a shot per every 5.8 touches in the attacking third over the course of the match.

It could imply a few different things. Such as it tells me that Seattle took advantage of their opportunities in taking shots and even with allowing of so many shots they turned those into opportunities for themselves. They probably weren't as over matched it might seem just because the advantage that Montreal had in shots (26) and final third touches (114). Going back to Columbus, it seems Philadelphia was similar to Montreal in the fact that both clubs had a good amount of touches, but it seems like the real difference in the matches is that Seattle responded with a good ratio of touches to shots (5.77), and Columbus did not (9.33).

These numbers don't contradict PWP. Columbus did a lot of things right, looked extremely good, and dare I say they make me look rather brilliant for picking them at the start of the season as a possible playoff contender. That said their shot numbers are underwhelming and if they want to score more goals they are going to need to grow a set and take some shots.

 Team att passes C att passes I att passes Total Shot perAT Att% KP
Chicago 26 17 43 3.308 60.47% 7
Chivas 32 29 61 6.100 52.46% 2
Colorado 58 27 85 6.538 68.24% 7
Columbus 53 31 84 9.333 63.10% 5
DC 61 45 106 17.667 57.55% 3
FC Dallas 34 26 60 6.000 56.67% 2
LAG 43 23 66 7.333 65.15% 6
Montreal 63 51 114 4.385 55.26% 11
New England 41 29 70 4.118 58.57% 7
New York 57 41 98 9.800 58.16% 6
Philadelphia 56 29 85 4.474 65.88% 10
Portland 10 9 19 2.714 52.63% 3
RSL 54 32 86 8.600 62.79% 3
San Jose 37 20 57 11.400 64.91% 3
Seattle 33 19 52 5.778 63.46% 5
Sporting 47 29 76 4.471 61.84% 7
Toronto 30 24 54 3.857 55.56% 6
Vancouver 21 20 41 5.125 51.22% 2
 Team att passes C att passes I att passes Total ShotpT Att% KP

There is a lot more to comment on than just Columbus/Philadelphia and Montreal/Seattle (Hi Portland and your 19 touches in the final third!). But these are the games that stood out to me as being analytically awkward when it comes to the numbers that we produce with xGF, and I thought they were good examples of how we're trying to better quantify the the game. It's not that we do it perfect---and the metric is far from perfect---instead it's about trying to get better and move forward with this type of analysis, opposed to just using some dried up cliché to describe a defense, like "that defense is made of warriors with steel plated testicles" or some other garbage.

This is NUUUUUuuuuummmmmbbbbbbeeerrrs. Numbers!

MLS Possession with Purpose Week 3: The best (and worst) performances

Here's my weekly analysis for your consideration as Week 3 ended Sunday evening with a 2-nil Seattle victory over Montreal. To begin, for those new to this weekly analysis, here's a link to PWP. It includes an introduction and some explanations; if you are familiar with my offerings then let's get stuck in.

First up is how all the teams compare to each other for Week 3:

Observations:

Note that Columbus remains atop the League while those who performed really well last year (like Portland) are hovering near the twilight zone. A couple of PKs awarded to the opponent and some pretty shoddy positional play defensively have a way of impacting team performance.

Note also that Toronto are mid-table here but not mid-table in the Eastern Conference standings; I'll talk more about that in my Possession with Purpose Cumulative Blog later this week.

Also note that Sporting Kansas City are second in the queue for this week; you'll see why a bit later.

A caution however - this is just a snapshot of Week 3; so Houston didn't make the list this week but will surface again in my Cumulative Index later.

The bottom dweller was not DC United this week; that honor goes to Philadelphia. Why? Well, because like the previous week, their opponent (Columbus) is top of the heap.

So how about who was top of the table in my PWP Strategic Attacking Index? Here's the answer for Week 3:

As noted, Columbus was top of the Week 3 table again this week, with FC Dallas and their 3-1 win against Chivas coming second, and Keane and company for LA coming third.

With Columbus taking high honors, and all the press covering Bernardo Anor, it is no surprise he took top honors in the PWP Attacking Player of the Week. But he didn't take top honors just for his two wicked goals, and the diagram below picks out many of his superb team efforts as Columbus defeated Philadelphia 2-1.

One thing to remember about Bernardo; he's a midfielder and his game isn't all about scoring goals. Recoveries and overall passing accuracy play a huge role in his value to Columbus, and with 77 touches he was leveraged quite frequently in both the team's attack and defense this past weekend.

Anyhoo... the Top PWP Defending Team of the Week was Sporting Kansas City. This is a role very familiar to Sporting KC, as they were the top team in defending for all of MLS in 2013. You may remember that they also won the MLS Championship, showing that a strong defense is one possible route to a trophy.

Here's the overall PWP Strategic Defending Index for your consideration:

While not surprising for some, both New England and Vancouver finished 2nd and 3rd respectively; a nil-nil draw usually means both defenses performed pretty well.

So who garnered the PWP Defending Player of the Week?  Most would consider Aurelien Collin a likely candidate, but instead I went with Ike Opara, as he got the nod to start for Matt Besler.  Here's why:

Although he recorded just two defensive actions inside the 18-yard box compared to five for Collin, Opara was instrumental on both sides of the pitch in place of Besler. All told, as a Center-back, his defensive activities in marshaling the left side were superb as noted in the linked MLS chalkboard diagram here. A big difference came in attack where Opara had five shots attempts with three on target.

In closing...

My thanks again to OPTA and MLS for their MLS Chalkboard; without which this analysis could not be offered.

You can follow me on twitter @chrisgluckpwp, and also, when published you can read my focus articles on the New York Red Bulls PWP this year at the New York Sports Hub. My first one should be published later this week.

All the best, Chris

Calculating Expected Goal Differential 1.0

The basic premise of expected goal differential is to assess how dangerous a team's shots are, and how dangerous its opponent's shots are. A team that gets a lot of dangerous shots inside the box, but doesn't give up such shots on defense, is likely to be doing something tactically or skillfully, and is likely to be able to reproduce those results.

The challenge to creating expected goal differential (xGD), then, is to obtain data that measures the difficulty of each shot all season long. Our xGD 1.0 utilized six zones on the field to parse out the dangerous shots from those less so. Soon, we will create xGD 2.0 in which shots are not only sorted by location, but also by body part (head vs. foot) and by run of play (typical vs. free kick or penalty). Obviously kicked shots are more dangerous than headed shots, and penalty kicks are more dangerous than other shots from zone two, the location just behind the six-yard box.

So now, for the calculations.

Across the entire league, for all 8,291 shots taken in 2013, we calculate the proportion of shots from each zone that were finished (scored):

Location Goals Shots Finish%
One 129 415 31.1%
Two 451 2547 17.7%
Three 100 1401 7.1%
Four 85 1596 5.3%
Five 51 2190 2.3%
Six 5 142 3.5%

We see that shots from zones one and two are the most dangerous, while shots from farther out or from wider angles are less dangerous. To calculate a team's offensive "dangerousness," we count the number of shots each team attempted from each zone, and then multiply each total by the league's finishing rate. As an example, here we have Sporting Kansas City's offensive totals:

Locations Goals Attempts Finish% ExpGoals
One 5 18 31.1% 5.6
Two 29 160 17.7% 28.3
Three 5 78 7.1% 5.6
Four 3 97 5.3% 5.2
Five 2 120 2.3% 2.8
Six 1 17 3.5% 0.6
Total 45 490 9.2% 48.1

Offensively, if SKC had finished at the league average rate from each respective zone, then it would have scored about 48 goals. Now let's focus on SKC's defensive shot totals:

Locations Goals Attempts Finish% ExpGoals
One 4 13 31.1% 4.0
Two 17 95 17.7% 16.8
Three 4 54 7.1% 3.9
Four 4 56 5.3% 3.0
Five 1 84 2.3% 2.0
Six 0 4 3.5% 0.1
Total 30 306 9.8% 29.8

Defensively, had SKC allowed the league average finishing rate from each zone, it would have allowed about 30 goals (incidentally, that's exactly what it did allow, ignoring own goals).

Subtracting expected goals against from expected goals for, we get a team's expected goal differential. Expected goal differential works so well as a predictor because teams are more capable of repeating their ability to get good (or bad) shots for themselves, and allow good (or bad) shots to their opponents. An extreme game in which a team finishes a high percentage of shots won't sway that team's xGD, nor that of its opponents, making xGD a better indicator of "true talent" at the team level.

As for xGD 2.0, coming soon to a laptop near you, the main difference is that there will be additional shot types to consider. Instead of just six zones, now there will be six zones broken down by headed and kicked shots (12 total zones) in addition to free kick---and possibly even penalty kick---opportunities (adding, at most, four more shot types). As with xGD 1.0, a team's attempts for each type of shot will be multiplied by the league's average finishing rates, and then those totals will be summed to find expected goals for and expected goals against.

Does last season matter? - Follow Up

I wrote a few weeks ago about the weak predictive information contained in a team's previous season of data. When trying to predict a team's goal differential in the second 17 games of a season, it was the first 17 games of that same season that did the job. The previous season's data was largely unhelpful. @sea_soc tweeted me the following:

https://twitter.com/sea_soc/status/406507942179905537

Ask, and you shall receive. Here's the weird shit I found when trying to project a seasons second-half goal differential:

Stat Coef. P-Value
Intercept -33.6 0.86%
AttemptDiff (first 17) 0.1 0.00%
Finish Diff (first 17) 90.6 0.12%
Attempt Diff (first 17 last season) 0.1 2.88%
Attempt Diff (second 17 last season 0.0 20.00%
Finish Diff (first 17 last season) 115.0 7.08%
Finish Diff (second 17 last season) -23.5 28.81%
Home Games Left 4.0 0.81%

Translation: Strangely, it's the first part of the previous season that is the better predictor of future performance. Not the second part of last season, which actually happened more recently. In fact, information from the second part of each team's previous season produced negative coefficients (negative relationships). Weird.

Now let's change the response variable slightly to be a team's goal differential from its first 17 games. Which does better at predicting, last season's first half or last season's second half?

Neither. In fact, there was nothing that came close to predicting the first halves of 2012 and 2013.

Stat Coef. P-value
Intercept 18.9 20.3%
Finish Diff (first 17 last season) -5.5 94.5%
Finish Diff (second 17 last season) 5.9 60.9%
Attempt Diff (first 17 last season) 0.01 26.6%
Attempt Diff (second 17 last season) 0.04 32.5%
Home Games (first 17 this season) -2.2 20.3%

With such small sample sizes, it could be there is just something really weird about the first halves, especially 2013. I say "especially 2013" because 2011 and 2012's first halves seemed to do a fair job of projecting the next season's second halves, so it's 2013 that seems screwy. Portland and Seattle performed opposite of what would have been expected for each, for example, while D.C. United and Montreal did the same confusing switcheroo in the Eastern Conference to kick off the 2013 campaign. So it could have just been weird randomness.

In the end, I'm quite certain of one thing, and that's that I'm still confused.

World Cup Draws: United States, Mexico, and the Netherlands

Of those three teams, it's the United States's draw that incites the least of my frustration.

Search for "world cup draw" on Google, and you'll find mostly opinions that the U.S. Mens National Team found itself in the group of death, as if there can only be one. But as many pointed out before the draw, the USMNT was not likely to get into an easier group. Coming from Pot 3, the USMNT was at a disadvantage already due to being in the weakest pot. Using ratings from Nate Silver's Soccer Power Index (SPI), here are the average ratings by each of the four pots:

Pot Rating Standard Dev.
1 85.9 5.0
4 79.7 3.3
2 76.2 8.0
3 73.7 3.5

Since teams from the same pot could not meet in the group stage, the USMNT couldn't draw any teams from its own pot. Thus it automatically got zero chance at playing some of the weaker teams in the opening round, leaving us praying for one of Switzerland or Belgium from the ranked Pot 1 to ease our path to glory (no such luck).  Additionally, all Pot 3 teams got a slightly higher chance of meeting two European teams in the group stages due to that additional UEFA team moving from Pot 4 to Pot 2. Pot 3 teams eluding a European team from Pot 1 may still have gotten Italy or England (I can't tell which one) from Pot 2. Costa Rica drew the short straw on that one.

If you look at Nate Silver's  ratings, you'll notice that most Pot 3 teams got pretty raw deals. Below are the chances that each team advances to the knockout round, as well as the average ratings for the other teams in their respective groups. Pot 3 teams are bold and italicized, and data came from Silver's own model.

Team Difficulty Knockout   Team Difficulty Knockout
Australia 86.6 2.0%   Italy 81.0 44.2%
Algeria 77.1 11.4%   Mexico 78.9 45.3%
Iran 81.8 18.9%   Ivory Coast 78.7 49.8%
Honduras 81.2 20.4%   Bosnia 79.3 52.6%
Cameroon 80.6 22.3%   England 80.3 57.5%
Japan 80.4 24.2%   Ecuador 78.2 64.7%
Costa Rica 82.2 28.8%   Uruguay 79.6 69.5%
Ghana 81.9 28.8%   Russia 71.6 72.6%
Nigeria 80.6 31.2%   Chile 79.9 74.3%
Croatia 79.7 32.9%   France 77.3 78.4%
Switzerland 79.7 36.5%   Belgium 71.1 79.1%
South Korea 73.8 36.9%   Spain 79.4 82.8%
United States 81.2 39.3%   Colombia 76.2 86.5%
Portugal 81.1 39.3%   Germany 78.0 91.8%
Greece 79.3 39.5%   Argentina 75.6 97.3%
Netherlands 81.3 41.0%   Brazil 73.9 99.6%

Relative to its stature in the world---17th best according to the SPI---the United States drew arguably the second-hardest group of opponents, second only to the Netherlands*. Though the USMNT may be in a group of death, the Netherlands are definitely in the group of death---and on the outside looking in. But it's our neighbor to the south that draws the most frustration. In terms of average group difficulty, the only North American side to get a relatively decent draw was Mexico. Mexico will just have to be better than Croatia and Cameroon in the group stage. Even after pissing all over themselves in CONCACAF qualifying, the Mexicans now have the easiest path of any Pot 3 team.

The Dutch side is the ninth-best in the tournament by the SPI, and yet it drew two of the best teams in the Cup, Chile and Spain. The Oranje, the team of my birth country, have been left sadly with just a 41-percent chance at making the knockout stage. The Mexican side is ranked 26th in the world, finished fourth in qualifying, and has a better chance to advance than the Netherlands.

Oh, FIFA.

*While Australia, Iran and Costa Rica all drew harder opponents on average than the USMNT, they were not as highly ranked themselves as the USMNT. In other words, it was expected that worse teams would get tougher opponents because they don't get to play themselves.