Year-to-year Shot Correlations

By Matthias Kullowatz (@mattyanselmo)

Perhaps I played my best card too early, publishing year-to-year expected goals correlations first. I won't waste a lot of words explaining what's going on here, but basically I'm just looking to see what shooting metrics correlate from the end of last year to the beginning of this year. This time, let's go back and look at raw shot totals. Without further ado, to the pretty plots!

Notes

Shot-to-shot correlations hold up pretty well against expected goals when it comes to repeatability. However, that doesn't necessarily mean they should be used in predictive models in place of expected goals. Expected goals not only predict themselves well ("stability"), but also predict outcomes well, like goals scored and games won.

For the most part, we see stronger correlations between shots on target than between total shots. This is not the first time I've found that some form of goal mouth placement at the team level is repeatable. The expeted goals model we use for goalkeeper ratings is based partially on placement, and this version of expected goals will almost surely creep into my prediction models this season.

Weekend Kick-Off: Watchability Score and DC goes to Orlando

by Harrison Crow (@harrison_crow)

I know. Oh, yes. I know. That pain of watching a boring game. I know the code; no American soccer fan may EVER admit that there are boring soccer games but the reality is that they happen. Just as there are boring NFL, NBA and NHL games, MLS is no different..

How many times have you ever tuned into a game and just thought "this is stupid, how did I think this was going to be a good game?" Well I present to you a solution that can perhaps point out potential entertaining games and help you bypass the ones that lead to the afternoon nap or watching Nicholas Cage movies on FX.

I present to you the Watchability Score. It's a pretty simple metric that is comprised of four attributes that A) keep the game flowing and B) make for entertaining circumstances. Those specific attributes are total shots in a match, total number of minutes tied in a match, fouls per match and completed dribbles per match. To get the metric,  add the rankings of each team in each of those categories. For example, if Team A is 1st in shots/game, 1st in mins tied, 1st in fewest fouls, and 1st in dribbles, then Team A would have a score of four. Conversely, if team B was 20th in each category, their ranking would be 20+20+20+20=80. In other words, the lower the number, the more "watchable" a team is.

1) We all hate games that are slowed down by fouls. It's annoying and doesn't frequently make for a good or aesthetically pleasing match.

2) Shots lead to goals and while goals are exciting the simple event of a shot being taken brings out attention back into focus. The more, the merrier.

3) Close games are obviously more entertaining than blow outs... on most occasions. Tie games mean there is something on the line for both teams and it makes people do things that are often entertaining.

4) Dempsey is a US international favorite because of the insane things he both tries and somehow manages to pull off. He's an entertainer and an amazing athlete. People that can do things with a ball that are more than simply just kicking it hard to a teammate or towards the goalie tend to make some cool things happen and most people like watching that. I like watching that.

I'll be using the Watchability rankings, slightly tweaking them and incorporating them into these posts in the future. If you have suggestions on how you'd like to see this tweaked in the future, hit me up on twitter, email me or just comment below.
------

DC United at Orlando City Soccer Club

Watchability Score ranks this match to be the third most interesting this weekend.

Watchability Score ranks this match to be the third most interesting this weekend.

The weekend is here folks and by God I'm feeling like it's a day late on it's arrival. I'm sure for Orlando City who is still recovering from all the missing internationals that another few days would have been nice. Tough break for them.

They play a team in DC United that has been pretty hard to figure out this season. They've had entertaining games, they've had "meh" games and then they had games last week that annoy you so very much because they managed to get points. plural.

Whatever. Good for them--and you know what, good for their fan base. They've endured a lot of crap in the last four-five years and the organization as a whole looks to be going back towards the top and that's good thing, I think, for MLS as a whole. I mean, I don't see how it could be a bad thing. Unless... I don't know something drastic happened like Ben Olsen was actually Gabriel Gray or something weird like that. That's impossible... right?

Looking at our expected goal numbers, Orlando City is a bit of an oddity. We can see they sit a bit on the above average side with expected goal differential but are sitting 18th in expected goals for and first in expected goals against. This isn't the type of dominating split I was expecting and it kind of calls to memory the season that DC had last year with similarly awkward numbers.

A league wide perspective on how OCSC and DCU compare to the rest of MLS.

The thing that makes Orlando unique and possibly leaves me with the thought they might be good is that they are dominant with keeping possession in their attacking third. Yes, their total average possession is tied for 14th in MLS but their ratio compared to opponents is the highest in the league at 1.54. Prediction: OCSC win.

Fantasy Perspective

DC United
Jairo Arrieta
($6.6 - 18.3%) - Who would have thought even three weeks ago he'd be the top selected player on the DC roster but here we are. I imagine much of that bump is a two fold equation playing at all and being cheap, while having a good first week.

Nick DeLeon ($6.5 - 13.6%) - Yet another cheap pick-up that starts regularly sometimes it's just about those easy points, everything else is gravy for some people. I guess.

Orlando City
Kaka
($11.3 - 28%) - The fourth most owned player in all of MLS fantasy and yet to let down his owners that have taken the chance on him early. He's been brilliant early on for a team that have very much depended on him to carry them early.

Rafael Ramos ($5.5 - 16.1%) - Ramos has been a great early get for a lot of fantasy managers as he plays, is super cheap and has been apart of a pretty stingy defense that doesn't relquish a lot of shots.

The Weekend Matchups:

Numbers in parentheses are expected goals in even game state, WS is the combined Watchability score for both teams playing. WS numbers are on a scale of 12 to 156, and the lower the number, the more "watchable" the game is.

Saturday:

Toronto FC (-0.83) at Chicago Fire (-0.21) - WS: 70
Watchability score for some reason, I haven't figured it out yet, kind of likes Chicago as a team. There are a lot of shots allowed in their games, most are close and they have some interesting things happen while flowing pretty smoothly. This game might even favor Chicago too! Prediction: Chicago, win.

New England Revolution (0.48) at Colorado Rapids (-0.16) - WS:100
Colorado just isn't a fun team to watch right now and it's kind of sad to see any team pack in at home. Prediction: DRAW

Houston Dynamo (-0.42) at Seattle Sounders FC (0.91) - WS:114
Speaking of not good teams, Houston is bad right now. I'm a bit surprised but that's how some of these go. The Sounders are favored and with a WS that's the highest of not being in favor of watching a game. So, ugh... proceed with caution. You've been warned. Prediction: Sounders win.

LA Galaxy (0.59) at Vancouver Whitecaps (0.26) - WS: 89
An obvious flaw within the metrics that I'm seeing is some sort of judge in importance. This stands out as this is a big game, but WS doesn't rate it highly. Prediction: DRAW

FC Dallas (0.15) at Portland Timbers (0.58) - WS: 65
I'm not sure I really needed a metric to tell me this would be an interest/fun game to watch but either way. Portland has all the numbers it's about them finally putting things together in a match against a very tough Dallas team. Prediction: Portland win

Sunday:

Real Salt Lake (-0.49) at San Jose Earthquakes (-0.74) - WS: 110
One of these days RSL is just going to end up being bad. The question is if this is one of those days. Could all the changes at RSL finally equate to our their xGDEven finally avoid being a model buster? Prediction: DRAW

Philadelphia Union (0.12) at Sporting Kansas City (1.16) - WS: 58
This is the first REAL test of the the Watchability Score as it thinks this will be the match-up of the weekend.  Sporting has statistically shown to be VERY good this season but the points in the right hand column shrug their shoulders with a heavy sigh. Prediction: Sporting, win.

------

NERD IMAGERY OF THE WEEK

This is how I basically how see I stand-off of Bill Hamid versus Kaka looking... only represented in a Heroes gif. You're welcome.

Finding goals for the rapids: is it time for a formation switch in colorado?

By Tom Worville (@worville)

(Note that in this article, "Possession Adjusted" is where you take the stat in question, divide by the team's possession, and multiply by 50% to put all teams' stats on a per-possession basis.)

Colorado Rapids are an intriguing side. They are the only team not to concede in MLS so far this season, but also the only team not to score, either. Having held Philadelphia, New York City and Houston Dynamo to 0-0 draws so far, the only thing they have going for them so far this season is consistency in results.

They’re not going to get easier opponents in the coming weeks either - New England, two games against Dallas, Seattle and LA Galaxy are all to come. These free scoring sides are likely to test the Rapids defence, so to have a chance at securing any points they’re going to need to start scoring. In this analysis I’m going to look into the Rapids blunt attack and what they could change to start scoring.

Teams that usually find themselves having scored no goals either have their bad luck to thank or are generating a low number of shots. For Colorado this is a mix of both. The Rapids currently have the 4th lowest shots per game numbers in the league, taking 9.7 shots per game. 17th out of 20 teams is bad, but if even the 20th team has managed to score then it’s not just your poor finishing which can be to blame.

One could argue that despite the team generating a low number of shots, these could be of poor quality. The Rapids sit 15th in the league in team shot accuracy though - once again not the worst and teams with less accurate shooters have managed to score more. From their Expected Goals numbers taken from American Soccer Analysis, the team has an Expected Goal count of 0.8 per game (2.4 Expected Goals overall). This indicates a lack of real quality chances - and their goal total of 0 is not entirely unexpected. They have also faced two of the better keepers so far this season in Tyler Deric and Josh Saunders, both of which are over-performing vs their Expected Goals against.

Deric and Saunders Expected Goals

Looking at the Rapids’ roster, their strike-force consists of Designated Player Gabriel Torres, Superdraft pickup Dominique Badji and veteran Vicente Sanchez. Between them they’ve only managed eight shots in three games: six for Torres and one apiece for Sanchez and Baji. This low shot creation from the strikers can’t be solely blamed on them - the Rapids sit at 6th lowest in the league for chance creation.

What’s worse is that Deshorn Brown, the team's top scorer the past two seasons with 23 goals, has moved to Norwegian side Vålerenga. With Brown gone, the squad is left with only Sanchez and Torres with any experience up front, although they have only scored 13 between them in the past two seasons. This is worrying as they are the only experienced strikers on the roster, which is unlikely to be strengthened until the arrival of Kevin Doyle in the summer. Any injury or suspension of Torres or Sanchez leave the team with fewer options in an already unthreatening attack.

I am very much a firm believer that clubs should play the cards that they are dealt and play to their strengths. If the Rapids don’t adapt their style they are going to have a long wait until Doyle arrives - and even then things may not change. For a start I think it is worth them reconsidering their formation.

From the excellent FootballLineups.com I can see that the Colorado Rapids have played a 4-2-3-1 every game this season. This formation isolates the striker and means that play heavily relies on the three attacking players behind him. Teams who play this system usually play it with a more physical striker - capable of holding up the play long enough to involve the attacking players who play behind him. Notable examples include Olivier Giroud for Arsenal and Romelu Lukaku for Everton. For Colorado, all three of their striking options don’t really fit the bill in terms of size or strength.

This system could work with Doyle - who possesses more of the characteristics of your traditional hold up man who would suit this formation. Until then I recommend a change of system.

An area of the game that Colorado could try to exploit is to cross more, as the team currently sits 18th for crosses per game. The introduction of Young DP Juan Ramirez looks a good signing from the limited minutes I've seen him play. The Argentine possesses good pace and plays like an old-school winger: drawing fouls and completing take-ons. If he could add crossing to his game, he could be a great outlet for chance creation for the team. The team looks relatively set with their back four and two holding midfielders in Sam Cronin and Lucas Pittinari. This leaves three vacant spots on the team sheet - as I make Ramirez a must-start, too.

Another player on the team who I feel is un-droppable is Dillon Powers. Powers has gathered quite a lot of interest from teams outside of MLS and with him recently getting his Italian passport I wouldn’t be surprised to see him in Serie A before long. For now though he’s on the Rapid’s roster. Powers operates most efficiently in an attacking midfield role, being the central link between the strikers and the deeper midfielders. His chance creation (highest in the team last year) and long shooting abilities make him a must-have in this side.

I would allocate the final two spots on the team sheet to Dillon Serna and Torres and thus play a 4-3-3 formation, with Powers being ahead of the other two central midfielders and Serna and Ramirez being either side of Torres. Despite Torres’ low scoring record, his pass accuracy has been better than that of Sanchez’s over the last two seasons. Serna certainly has the legs over the aging Sanchez, despite not creating more chances than him per 90 last season. The option of Sanchez from the bench is another dynamic that could be used to change matches.

Suggested Colorado Rapids formation

Playing a formation like this would help Colorado play to their strengths in terms of preference to play long balls and utilize their attacking players effectively. The central player in the front three is also able to drop deeper and play the false nine - relieving him of the target man-like qualities the team needs to play 4-2-3-1. The wide players have more space to run into to make the long balls a more viable option.

The reason why the formation needs to be changed can be taken from looking at the possession figures in more detail. The Rapids sit joint 19th with San Jose on 47% possession per game. Looking at the way the team passes, they sit 15th in the league in terms of short passes per game (possession adjusted) and the 1st for long balls per game (possession adjusted). The team also has the highest average pass length in the league of 22 meters. As stated previously, this isn't a side where long balls will work in the current system, and the Rapids have a big preference for long balls.

These passing figures alongside the fact that the team has the lowest passing accuracy of all the teams in the league of just 72% (putting into context: over 1 in 4 balls passed is misplaced) show that Colorado’s ball retention is not the best. It is worth noting that due to the number of long passes the team plays, their pass accuracy is likely to be skewed down anyway (long passes = more inaccurate).

By comparing the number of chances created to short passes made we can get a feel for a side's efficiency in possession. For example, the Seattle Sounders make an average of 65 passes (possession adjusted) before they create a chance. On the other hand, Sporting Kansas City makes a minute 26 passes (possession adjusted) before they create a chance - a snip in comparison. Colorado sits 13th for this efficiency measure, making 44 passes per chance created (possession adjusted). This shows that when they are able to string passes together they are better at creating chances. The formation change to 4-3-3 and utilising Powers as a pivot between defence and attack can help capitalize on this.

Alternatively, looking at long balls per chance created we can get a similar sense of efficiency with a teams long ball usage. Colorado sit joint 19th in the league here, making 11 long balls before creating a chance. I wouldn't be so critical of the usage of the long balls if they actually helped the side create chances. This shows that they are more of a hindrance. For comparison the San Jose Earthquakes, who make a similar number of chances per game (7.75 vs 7.33) make only nine long balls per chance created. Not a huge difference, but then again San Jose have scored six goals in the league so far this season, the joint highest in the league.

Now by comparing short and long passes per shots we can see how many passes a team needs to make before a shot is taken. Colorado sit 19th in long passes per shot, but 13th for short passes per shot. This difference highlights the sides greater efficiency in terms of short passes rather than long passes - and once again the need to try to capitalize on their short passing strengths more. Within the squad they have some good passers - Cronin, Marcelo Sarvas and Powers are three examples with 75%+ passing accuracy.

Finally by looking at how much time a team spends in each third of the pitch, we can get a greater sense of how efficient they are with the ball. Colorado’s split between Own Third/Middle/Opposition Third is 29%/45%/26%. By multiplying the number of short passes by the time spent in each area we get a rough idea of the number of passes made in that part of the field. That will help get a further sense of efficiency on the ball.

Colorado make the 10th lowest number of passes in the Opposition Third with 84 per game. This makes them the joint 5th most efficient side in MLS in terms of passes per chance created - needing only 11 passes in the opposition third to create a chance. For comparison, the lowest is Sporting Kansas City again, which only makes seven passes. Seattle makes 19 passes in the opposition third per chance created - making them the slowest team to build up chances in the league.
There is evidently strength in Colorado’s short passing play - which far exceeds any joy that they are getting from long balls at the minute. Were they to adapt a new formation that takes advantage of this passing strength the side might get closer to scoring more goals. By incorporating a more fluid transition from defense to attack and utilizing crosses more, the club may get some returns in terms of chance creation and shots taken and hopefully somewhere down the line - some goals.

Scoring The Proactivity of MLS Teams

By Jared Young (@jaredeyoung)

Last year I became interested in using statistics to measure a team’s style of play. I was inspired by a Jonathan Wilson article that laid out two extreme styles, which he labeled proactive and reactive. Proactive teams are concerned primarily with possessing the ball and high pressure on defense to get the ball back as quickly as possible. This is Barcelona and tiki taka in its purest form. The reactive teams are characterized by a desire to maintain their defensive shape, will typically offer low defensive pressure and will be direct in their attack.

I've adapted the score, that I called P Score, since last time and the details for the curious are below. One thing about the change to point out now is that I've adjusted the scale to be a 10 point scale - 10 is a high level of possession and 1 is very reactive. 

Here are the P Score rankings for MLS through March. The columns to the right of the total scores show a team’s proactivity relative to their opponent. The way to read the table (for example starting in data column 3) is that Orlando City was less proactive than their opponent in 25% of their games and averaged one point per game. A game is considered even if the two teams were within one point of each other in their P Score for that game. 

Less Proactive Even More Proactive
Rank Team P Score Pts/Gm % of Gms Pts/Gm % of Gms Pts/Gm % of Gms Pts/Gm
1 Orlando City SC 9.5 1.3 0.25 1 0.25 3 0.5 0.5
2 Montreal Impact 7.3 0.7 0.33 0 0.33 1 0.33 1
3 New York Red Bulls 7.3 2.3 0 0 0 0 1 2.3
4 Chicago 7.3 0.8 0 0 0.5 1.5 0.5 0
5 Seattle 7 1.3 0 0 0.33 3 0.67 0.5
6 D.C. United 6.7 2 0.33 0 0 0 0.67 3
7 Toronto FC 6.7 1 0.33 0 0.33 0 0.33 3
8 Philadelphia 6.5 0.5 0.25 1 0.5 0 0.25 1
9 Houston 6.3 1.3 0.25 1 0.5 0.5 0.25 3
10 Columbus 6 1 0.67 0 0 0 0.33 3
11 L.A. Galaxy 6 1.3 0.25 0 0.5 2 0.25 1
12 NYCFC 6 1.3 0.25 1 0.25 1 0.5 1.5
13 Vancouver 5.8 2.3 1 2.3 0 0 0 0
14 Colorado 5.3 1 0.33 1 0.67 1 0 0
15 New England 5.3 1 0.25 0 0.75 1.3 0 0
16 Portland 5 0.8 0.25 1 0.5 1 0.25 0
17 Salt Lake 5 1.7 0 0 0.33 3 0.67 1
18 San Jose 4.5 1.5 0.75 2 0.25 0 0 0
19 FC Dallas 4 2.5 0.5 2 0.25 3 0.25 3
20 Kansas City 3.5 1.3 0.67 2 0.33 1 0 0

Observations

  • Orlando City SC so far scores the highest with a Pscore of 9.5, significantly higher than 2nd place Montreal
  • A couple of teams that are usually known for their possession oriented style of play are at the bottom of the list. The Portland Timbers change of style has been noted, but Sporting Kansas City anchoring the list is a big surprise given their history of a 4-3-3.
  • Two of the best reactive teams last year, New England and Dallas, are again near the bottom of the league.
  • Looking at the table in some depth reveals some interesting early trends about where points are concentrated. I summed up the table in a visual below.

What this table says is that if a team is going to be proactive, it’s beneficial to be more proactive than their opponent. The same goes for reactive teams - results are better when a team is more reactive than their opponent. The implication is that commitment to an execution of a style of play, regardless of style, is a key contributor to success. That’s a pretty fascinating learning and I’ll monitor the numbers over the season as we get bigger sample sizes. 

The New P Score Calculation

The P Score is built off the idea that pass type data can indicate what style of play a team is playing. A proactive team will attempt a higher number of shorter passes and should in theory have a higher percentage of backwards passes. A direct team will attempt longer passes in an effort to counterattack and will have less backward passes. 

When I developed the P Score on the 2014 season I was disappointed in the availability of passing data and I was forced to use variables that I didn't want to use. The model simply used the percentage of long passes and total passes. Recently, Whoscored added more pass types to their match center and I've evolved the model. I tried most pass types available including short, long, backward and through passes as well as crosses. I also looked at blocked shots because reactive teams block a higher percentage of shots than proactive teams. Given their penchant for defensive shape, that makes sense. 

I used multivariate regression using outcomes from a collection of games from the 2014 season. You can read which games I selected for the dependent variable in the prior post. Only two pass types ended up being statistically significant; the percentage of backward passes and the percentage of long passes. Both coefficients adjust the model in the direction you would expect. A higher percentage of long passes lowers the score and a higher percentage of backward passes increases the score. I did not use total passes in the model because that variable can be strongly influence by an opponent, whereas percentages would be more likely to indicate a team’s actual intent. The Rsquared of the new model was a sturdy 0.79.
The old and new models had similar results. I scored the 2015 season both ways and the correlation between the two is 0.95. Orlando City SC is still the top team and Sporting Kansas City is the bottom team scoring both ways.

I strongly prefer this version of the model because it looks at the percentage of the type of team passes to indicate style as opposed to anything related to volume, which as I mentioned would be much more likely to be manipulated by an opponent.

If you have any questions about the methodology please leave a comment or reach out to me on twitter @jaredeyoung. I’ll be publishing the P Score table monthly throughout the season.

How Long Does It Take a Team to Mesh?

By Kevin Minkus (@kevinminkus)

While beginning a season 0-3-0 does not a happy fan base make, Sunday's win over Philadelphia has some Chicago Fire fans feeling at least a little better about the team's rebuilding process. Throughout the beginning of the season, coach Frank Yallop has frequently stressed that the team needs time to adjust to each other. After all, they brought in three new designated players during the off-season, and are returning players who accounted for only 63% of last year's minutes (the league average over the last four seasons is around 71%). It should take a while for all of those new pieces to mesh from the somewhat disjointed side we've seen into a coherent whole. But, given the Fire's level of roster turnover, how long should we expect the meshing process to take?

The term “meshing” is a slippery one, and can be defined in any number of ways.  Is it when a team's roster turnover no longer informs its results? Is it when a team's results sufficiently indicate its performance for the rest of the season? Is it when a team reaches the level of performance it will remain at throughout the rest of the season (if, in fact, a team can ever be expected to do so)?

Each of these definitions could be argued as valid, and I'm sure there are many other possible definitions not considered here. As it stands, though, these are the three I will analyze, using MLS data since 2011, in hopes of arriving at an answer to the question of how long it takes a team to mesh.

Let's start with the first definition- meshing defined as the number of games in which roster turnover still directly informs a team's results. 

This graph shows the correlation between points after x number of games and the percentage of a team's field minutes returned from the previous season. 

A positive correlation suggests that as roster stability increases, so does points earned. Numbers below the red line are not considered statistically different from zero (at 90% confidence). Note that the correlations in general aren't huge, but they do exist. As you can see, the correlation between roster stability and points peaks at game three, and remains statistically significant until game five (after which it remains insignificant until close to the end of the season).

A similar pattern exists if we look at defensive stability, though the correlation becomes doesn't become insignificant until after 8 games:

These two graphs, then, suggest (though perhaps not convincingly), that it may take as few as three or four games for a team in general to mesh, while it may take as many as eight for a defensive unit to come together.

Now let's take a look at the second definition- meshing defined as the point at which a team's results through some number of games “sufficiently” indicate what its results will look like for the rest of the season.

To do this, I've split teams into two groups- those with “high” roster turnover (in the top 50%), and those with “low” roster turnover (in the bottom 50%). I then regressed the team's final points total on the team's points total after x games, for each of the two groups. The Rsquared values for each of these regressions are graphed below, with the linear models from the set of all teams included as well. So essentially what we are looking at it is how well we can predict how a team will finish the season, based on what they've done after a given number of games.

Through six games, each game is about as predictive for each group, meaning that how well a team with high roster turnover does through six games is just as indicative of how that team will finish as how well a team with low roster turnover does through six games. That is to say, we don't gain any extra predictive power by knowing a team's level of roster turnover.

By game seven, though, high turnover teams begin to out-pace low turnover teams- by game seven we have a better idea of how high turnover teams will finish the season than low turnover teams. 

By game nine, the R2  value for high turnover teams is at .546, which is pretty high. We would expect predictions made using this nine game point total to be on average only about seven points off the final season total. That gets us pretty close for being barely a quarter of the way into the season.

 Though it's a normative statement not a positive one, and you could really draw the line anywhere, I would probably suggest that nine games is as good a place as any to set the limit on meshing based on our second definition. At the very least, we can say that after nine games we should have a decent idea of whether the rebuilding process will be successful in year one.

Finally, let's turn our attention to the third definition- meshing as the point at which a team reaches its consistent level of performance.

Let's investigate this phenomenon a little bit. 

Here's a graph of the three game rolling expected goal difference (at x = 4, the value on the y axis is the xGD from games two, three, and four, for example) for Sporting Kansas City last season- a decently representative mid-table team.  Expected goal differences provide a pretty reasonable statistic for gauging how good a team is.

It's pretty much all over the place. 

A three game rolling points per game graph of another mid-table team from last year, the Vancouver Whitecaps, tells a similar story:

These graphs point to something which I think is an important (though perhaps obvious) point to make; it's mostly unreasonable to expect game by game measures of a team's strength to converge over the course of a season. (Metrics like xGR (expected goal ratio), TSR (total shot ratio), and points per game will converge, but usually only when they're being calculated on aggregate.) There are a lot of reasons for this. Injuries, international call-ups, strength of schedule, and mid-season transfers are all factors which affect a team's consistency of performance. Teams, save maybe the very dominant and the very bad ones, just go through peaks and valleys throughout the year. They have good games and bad games. 

What does this mean for meshing, then?

Well, we've already seen that how a team performs at the start of the year can be predictive of where it finishes, particularly for teams with high turnover. The point above, though, suggests that how a team starts the year isn't necessarily indicative of how it will perform throughout the year. 

For teams who haven't quite come together yet, then, there is certainly still hope of righting the ship. Given the above analysis, I would expect the effects of having new players brought in to the system to begin to wear off by game four or five (though this may take a bit longer this season because of international call-ups). By game nine or ten, a team should have a decent idea of how well it has done in rebuilding its roster. If things remain bleak at that point, there is still the possibility of finding some success, but it may come only in limited doses.

USMNT IN Switzerland: Beyond the Score

By Jared Young (@jaredeyoung)

The USMNT took on Switzerland Tuesday, their 9th friendly since the World Cup, and in the process relinquished their 6th second half lead. The 1-1 draw wouldn't have been as much of a disappointment if the result didn't tell the same story about a team unable to hold a lead against top competition. The USMNT is now eleven goals against and just one goal scored in the second half of these friendlies. And that’s all I’m going to say about that. Here are three other stats to take away from the latest International weekend.

9: Is Klinsmann too conservative? Jurgen Klinsmann’s team didn't escape Europe with double digit shot attempts, as they finished with just nine. Is the team too conservative when it comes to shot selection? Three goals in nine attempts is an excellent conversion and there were a few shots that could have easily been converted, Michael Bradley’s sitter against Switzerland being the most notable. But are there too few shots taken? Consider that eight of the nine attempts were taken inside the box and even more crazy, inside the area of the spot. There was only one shot attempted from outside the 18-yard box, and that was Brek Shea’s laser goal off of a free kick. In other words, the team didn't attempt a shot outside the box in the run of play. Pause on that one for a moment.

This weekend the USMNT attempted 18.7 passes in the final third for every shot while their opponents attempted 10.8 passes in the final third per shot. Considering the US was playing a more direct style on offense that does imply they may be too picky once they get the ball in position. The results this weekend weren't terrible, especially offensively, but it does beg the question: does the US have the right shot selection balance offensively? More in part III of this post.

19.8: High energy, low team pressure. Colin Trainor has been publishing work on a metric that attempts to measure how much a team employs the high press. The metric takes opponent passes attempted in their defensive half plus about 20% of the offensive half of the field (so about 60% of the field that is the farthest away from their goal) and a team’s defensive actions in that same area. The lower the passes per defensive action, the more intense the high press. A measure of mid-single digits would indicate a consistent high pressure strategy. Here is the PPDA metric chart by team and area of the field.

You can see from the chart that Switzerland was much more aggressively defending up the pitch than the US. When the action was in the defensive end, both teams employed similar pressure. This resulted in the possession being strongly in favor of Switzerland at over 60%. The US did have high individual energy in their opponent’s offensive half but mainly that running around was just to disrupt the Switzerland offense as much as possible. The team as a whole was willing to wait to employ significant pressure. We didn't see a particularly aggressive US team this window and it makes you wonder if Klinsmann isn't perhaps going for results instead of pushing his team to be proactive like he was doing during the last World Cup cycle in these friendlies.

2: Blocked shots against UEFA teams. I now the late game defense is the big issue, but I’m not done harping on the shot selection. In this nine game stretch the USMNT has taken to the road against four European foes and have managed a 1-1-2 (W-D-L), but could easily have been 3-0-1. They did this attempting just 29 shots in the four games, an average of 7.3. The crazy stat is that only two of those shots were blocked, or just 6.9% of the total shots. A typical blocked shot percentage is roughly 25%. You can’t argue with the 17% finishing rate in those four games, but it does make you wonder the team is too picky on offense. 

Let’s do a little thought experiment to see if this trend is something that should change. Back to the latest window and games against Denmark and Switzerland. What if the US took shots as frequently as their opponents but also finished their shots at their opponents’ lower rate. The numbers would look like this:

The US would have only scored 2.6 goals had they been as selective as their opponents, and so while the sample sizes are clearly small, at least it looks from here that Klinsmann isn't too crazy.

Next up for the US is the rowdy rivalry with El Tri in what will hopefully be a Gold Cup Final preview (said by the guy living in Philly, home of the Gold Cup Final).

Year-to-year Correlations with pretty plots

By Matthias Kullowatz (@mattyanselmo)

As I began constructing prediction models for this season, I was faced with the obvious problem of dealing with small sample sizes. Teams have played three or four games to this point, which isn't much to go on when trying to forecast their futures. Portland, for example, has produced the fifth-best expected goal differential in the league (xGD of +0.22), but is missing its two best midfielders. I'm skeptical that the Timbers will be able to maintain that in the coming weeks. So I'm looking to last season to help me out with the beginning of this season.

Below are some heat plots depicting the correlation of six metrics to themselves. For example, if we sum each team's goals scored in its last 10 games of the past season and correlate that to its goals scored in the first ten games of this season, we get a correlation coefficient of 0.195. The highest correlations never breached 0.60, so a "red hot" correlation in the plots is about 0.60. Each of these correlations comes from a sample of 56 teams (18 in 2011-12, 19 in 2012-13 and 2013-14).

Notes

For the most part, expected goals stabilize to a greater degree than raw goals across the off-season. 

Goals Allowed is a strange metric where the number of goals allowed in a team's last game of one season--a single game!--correlates strongly to its goals allowed during the next season. My theory is that the teams that have thrown in the towel by season's tend to play more open and are likely to allow more goals toward the end of a season. Those same teams tend not to be good--that's why they're not in the playoffs--and they continue to suck in the following season.

Expected Goal Differential shows a very strong correlation across the off-season, and I'm eager to employ some previous-season xGD data in the predictions models.

Next up, I'll look at the xGD in even gamestates across the off-season, and I'm hoping to publish those prediction models by Even Better Monday (the one after Good Friday). So be on the lookout!

USMNT at Switzerland: Roster Churn, TSR and PDO

by Harrison Crow (@harrison_crow)

I have a story I'm going to share here. Please bear with me.

People that know me at work feel compelled to try and talk soccer with me. I'm not trying to be mean but I'd like to think I'm a well rounded person that I'm not so limited in talking points to just soccer, sports or numbers in general. But this was the painful conversation that was had.

Co-worker: "Denmark today, huh? I love those bright orange uniforms."

Me: *Wince* "Yep. They're a bit different."

Co-worker: "They did super well last year in the World Cup too, should be a tough match-up."

Within a span of a few sentences we covered a mishmash of a few different European countries none of which were actually Denmark. First confusing the Danes with the Netherlands, which was then confused for Switzerland.

Whatever. I'm not here to be a jerk and mock guys that get these details wrong. Actually it kind of makes my point. Outside of most soccer nerds a late loss to Denmark means relatively nothing except just that we lost, and did so in a manner that is becoming painfully redundant since the World Cup.

But, as you may hear quoted on our podcast this week, the US has circulated 43 different players over the last eight months after the World Cup. Without getting too granular into the call-ups we can simply acknowledge that they've used a lot of people. Add that to there were 41 substitutions being made during those eight matches played and I think it's fair to say that there is a bit of forming and maybe even some storming going on with this roster. 

Because of all of this roster churn, I suggest bringing a bit of skepticism when looking through those numbers that the US has posted over the last eight months.

Matt Doyle, the MLS Armchair Analyst, wrote an article following the match on Wednesday where he mentioned the idea of Total Shot Ratio (TSR) being an example of why the US hasn't been good and linked a graphic displaying how poor USMNT has been since the World Cup.

pdo-tsr.png

Total Shot Ratio is simply used to give some insight into the performance of the team and convey how dominant they are in creating attempts on goal compared to their opponents. These numbers imply that matches leading up to the World Cup the US was really doing well. Like, super well.

But to give that some context, besides saying that .440 is similar to what Sunderland produces, the US had .705 shot ratio over 30 matches prior to the World Cup and leading back to the start of 2013 against Canada. No club team I could find currently boasts a .705 shot ratio. It's crazy high.  The closest team in MLS is Columbus Crew SC with .672 and over the pond in EPL Manchester City reigns supreme with .647.

The primary issue with the US's high TSR is that it largely stems from their domination of the Gold Cup in 2013. That tournament is skewing the numbers in favor of the US which took 129 shots and limited their opponent to only 53 through six matches. If we remove that tournament it paints a bit of a more sober image.

pdo-tsr2.png

Along with their TSR, PDO dropped some too. Implying that it was perhaps the right call to remove the Gold Cup because it removed some of the luck inference, which is what PDO is there to measure, thus giving us a better and more realistic outlook. Devin Pleuler once refered to PDO as a basic proxy for luck as the statistic typically regresses back to the mean. If the mean is 1000, and in this case it is, less than that is bad and more than that is favorable... depending on your perspective. It also questions the sustainability of performances.

The US, despite dropping so many goals in the last seconds, is seen as a team that tends to float "above the line". Mostly because of the amount of goals they've scored relative to the shots they've taken. It's also about the fact they've got great keepers and their save percentage (which is kind of stupid statistic anyways) tends to float above the norm.

So what does this TSR/PDO business all mean in the end? Simply, that things could still be worse but that there is some optimism for their performance to improve in the future.

The oddest thing about this whole thing is the disparity and complete drop-off of  performance following the World Cup. I'm still waiting not convinced that we can take these numbers too serious  until more research can be done about post-World Cup friendlies in general and identify some of the pitfalls of international roster churn in general.

Might I then suggest a new method for watching the game, and something that I'm going to attempt to put into practice myself, follow individual performances. I know this isn't revolutionary and I'm sure most do this anyways. But seriously watch. The following things are checklist items for myself.

  • How many touches does Jozy Altidore get and how many become shots?

The fact is that I've been super unimpressed with how many shots he's actually creating. Looking strictly at the ones he taken, he's had nine shots in 646 minutes according to my home work. This is not good and he's not going to continue finishing shots like the one linked below, it's just not likely.

 

  • Who is winning the duels on this team?

During the World Cup Jermaine Jones, like him, love him or hate him--with help of Kyle Beckerman--stole everything that was loose on the field. Who will be heirs to their thrown?

  • Who is putting passes to boots and creating shots?

I already mentioned my heartburn with Jozy, but the thing is he doesn't have to do all the work. He just has to get into high leverage positions (which he does REALLY well) and then take the shot (which he doesn't do very well). The question is who gets him the ball in these positions. Who is putting passes into the box or running with the ball at their feet dribbling past defenders?

I mentioned last week that Clint Dempsey isn't going to be around much longer. I personally would be surprised if he's on the next World Cup roster. He's a phenomenal player but what makes him so great is that the US has no other player like him in the talent pool. He creates shots, he takes shots and he's an incorrigible enigma. We won't find another like him but we need to find a way to supplement those shots he's creates.

Looking at any of these three things none jump out as being "wow that's it!". Their all actually things people have problem commented about before and even prior to the World Cup. Whatever, I'm cool with not being unique. These items still matter.

Switzerland themselves is ranked 16th overall by Nate Silver's Soccer Power Index and sits just one spot above the USMNT in ELO ratings. Suffice to say this is a team that is pretty equal in quality at full strength. But the reality is that the US isn't at full strength, so what does it matter if Switzerland is or not either?

I dare you to suspend the idea of results for at least until the Gold Cup. They don't matter anyways. Let's just enjoy being indoctrinated to some young and exciting talent. If after the Gold Cup the US doesn't have an auto bid to the Confederations Cup, then I think we can become legitimately frustrated and/or a bit worried about the future of the national team.
But for now let's just enjoy this.

The Weekend Kickoff: No International Break for the Wicked

by Harrison Crow (@harrison_crow)

While most of the rest of the world breaks for a couple of weeks due to the FIFA international schedule, Major League Soccer, like many of their officiants, play on through. This gives an obvious advantage to those teams who either don't have internationals selected for this break or, as is rarely the case in MLS, have the depth to combat those missing in action.

Taylor Twellman's thought about the possibility of both Jozy Altidore and Michael Bradley each missing as many as eight potential matches for Toronto FC due to FIFA/MLS scheduling conflicts shatters the thought of this being just a passing issue. As a whole, MLS looks to have as many as 57 players selected to represent their country. I don't think anyone wants to make light of the honor of playing for your country, but it's obvious that this issue is only going to be more pronounced and compounded by the fact that so many internationals are returning to the states, and the talent in MLS continues to improve.

There are a lot of other numbers that can be extrapolated from 57 players leaving their teams and venturing out for the opportunity of representing their country. But rather than focusing on the absences, lets focus on those who are sticking around. 

The Weekend Matchups:

Numbers in parentheses are expected goals in even game state.

Saturday:

San Jose Earthquakes (-0.26) @ New England (-0.17)
It's kind of funny; SJ has been the "surprise" and New England has been the "disappointment" and yet they're very tight on expected goal differential in even game states. Based off the talent levels I would expect that New England has missed a few pieces and has maybe been unlucky, while SJ is probably on the other end and has gotten lucky more often than not. Time will tell.  Prediction: DRAW

Orlando City Soccer Club (0.10) @ Montreal Impact (-0.73)
Sure, OCSC hasn't been very good at finishing the chances they've been given, but they still have one of the elite players in the league in Kaka while Montreal is fishing to find some consistency. Both teams are missing some key individuals. Prediction: DRAW

Sporting Kansas City (0.97) @ New York City FC (-0.10)
With all eyes turned to the big cities and bright lights (LA, Seattle, New York), Kansas City is very quietly putting together some interesting numbers. KC is first in expected goals differential in even game states and third in expected goals, lending to the idea that SKC might be returning to their dominating ways. That said, both NYC and SKC are in the top four in expected goals created. I think this could be a really good game to watch this weekend. Prediction: SKC

LA Galaxy (0.19) @ DC United (-0.14)
LA has yet to really look like LA this season. But it always seems to take them a month or two before they start caring about the season. DC United didn't look good away at New York, and even with this being a home match, I expect they will struggle. Prediction: DRAW

New York Red Bulls (-1.14) @ Columbus Crew SC (0.85)
I almost feel as if I should call Crew SC a surprise team, but they're not. A very strong season last year has lead to them to building off that and creating a roster that's both fiscally affordable and full of depth. The other side of this is the Red Bulls have seemed like they're maybe almost still a good team. I love that midfield with Dax McCarty, Lloyd Sam and Felipe Martins, and with BWP still putting away shots up top. The problem is that the Red Bulls are in the negative for expected goals differential, suggesting that maybe their defense isn't as good as what it has seemed to this point of the season... or maybe it's still trying to catch up from two disappointing games. I'm not sure at this point. Prediction: Crew SC

Portland Timbers FC (0.52) @ Vancouver Whitecaps (0.41)

Both these teams are depleted; Vancouver with international absences and Manneh's suspension, Portland with injuries to three of their top four central midfielders. Still, both teams have some dynamic pieces available and have strung together a few really strong performances. But this might be the best Vancouver Whitecaps team that we've seen in MLS and it's possible they may just end up doing more than being a fifth or sixth seed this season. Prediction: VANCOUVER

Colorado Rapids (-0.54) @ Houston Dynamo (-0.41)
Neither of these clubs have been good, but both have seen their share of luck. The difference is that Houston has seen mostly good luck whereas the Rapids have seen a rather mixed bag. Prediction: DRAW

Seattle Sounders FC (0.13) @ FC Dallas (0.56)
I had a really great conversation with an FC Dallas fan the other day who felt that I really slighted Dallas because I've been a bit concerned with their defense and card propensity. Their defense is probably a bit overrated but still very good and their attack, while having been good to this point, I don't think has really even gotten started yet--implying I think they probably are going to get better. Conversely, the Sounders are missing pieces, specifically Dempsey, and had a midweek friendly at home against Club Tijuana. Prediction: FC DALLAS

Sunday

Philadelphia Union (0.97) @ Chicago Fire(-1.05)
Our metrics don't seem to think the Fire are as bad as a team as most would think. But that doesn't mean they're especially good. Philly has yet to really have the hallelujah moment, but our numbers think of them as possibly the second best team in the league. This could lead to a Kentucky v. West Virginia moment if there was one. Prediction: PHILLY

Toronto FC (-0.66) @ Real Salt Lake (-0.97)
No Jozy, no Bradley, no Rimando, no Saborio. This game is the let down of the week. It should be fantastic and yet this match-up will be a bit sobering and probably a bit boring. Prediction: DRAW

USMNT at Denmark: Beyond the score

By Jared Young (@jaredeyoung)

These three statistics help tell the story behind the latest USMNT result, and look beyond for big trends.

+7, -10. Those numbers represent the USMNT’s goal differential since the World Cup, split up by the first half and second half, respectively. A dominant first half has typically been followed by a more tragic second half coming out of the locker room. It’s well noted that the USMNT second-half defense is being criticized, but did you know that Johannsson’s second half goal against Denmark was the first goal the US team has scored in the second half since the World Cup? The late slump is not just a defensive concern.

Part of the issue could be that Klinsmann is playing less experienced players in the second half. That is somewhat true. Players who were on the World Cup roster have played 75 percent of all of the first half minutes. That number drops to 62 percent in the second half. But that overall percentage isn’t as experimental as it seems. The World Cup players have a strong presence, regardless of the half.

20.4%. The USMNT only squeezed off four shots against Denmark but managed to score on two of them. That extreme efficiency has been the trend more recently, and the World Cup players specifically have been blistering since July. Led by Jozy Altidore and his 44-percent finishing rate, the players on the World Cup roster have scored on 20.4 percent of the shots taken since the big tournament. Compare that to the 4.3 percent finishing rate of the new players on the team.

60%. (We’ll get to what this number stands for in a bit) I’m sorta kinda from New Jersey, and so is Alejandro Bedoya, so I’m probably supposed to root for him. But his persistent presence on the pitch for the USMNT continues to bother me. First, let’s talk about what I appreciate from Bedoya, and it’s well documented. His work rate is exceptional and his positioning is first rate. He’s a defensive minded midfielder that will do the dirty work and doesn’t look for the limelight. His defensive work against Denmark was critical in the first half as he sat deep enough to assist an otherwise sloppy back four. My trouble with him is that, from a playmaking point of view, he offers very little. And the US can’t afford to have players like Alejandro Bedoya play in the World Cup. For me, Bedoya is a stark reminder of the limitations of the team. As long as he is playing, I worry the US is not progressing as much as they need to during this cycle. The US struggled mightily to generate offense on the wings in the World Cup, and they simply have to upgrade that area to be a global force.

Bedoya is third on the team in minutes played since the World Cup with 409 minutes, behind only Mix Diskerud and Jozy Altidore. In the recent match against Denmark he was moved to the center of the midfield, where he typically plays at Nantes and where his lack of playmaking can be better hidden. So where does the 60 percent number come from?  That was the percent of Bedoya’s passes that were backward in the match. A typical team will pass 20-25 percent of their passes backward over the course of the game, and for a deep lying player 60 percent is way too much. Tack on the fact that the US was ceding possession and would have benefited from a more direct approach, and Bedoya’s backwards passing tendency becomes more of a glaring issue. Sorry my Jersey breathren, I’m looking for more in Russia.

To sum it up: The USMNT comes out extremely red hot in the first half, and somehow flips to an even more extreme cold in the second half, and that is what has everyone concerned. Experimentation is part of the problem, but protecting leads should be the key focus until it gets fixed. The players who played in the World Cup are very efficient right now in scoring goals, converting over 20 percent of their shots, while the newcomers are struggling in front of the net. I’ve also got my eye on Bedoya and I’m looking to see who is going to pass him on the depth chart, either on the wing or now at central midfield.