Game of the Week: Seattle at Portland

We spent a fair amount of time talking about this match on Thursday night's podcast if you are interested in that sort of thing. Coming into this match, form might be one of the most discussed topics. Seattle has earned just two points in its last four games, while Portland has earned eight points over that same span. In my opinion, form only means something if there's an obvious reason as to its recent fluctuation. For Seattle, there is no obvious reason for its poor play. Seattle has been bad the last four matches in terms of results, but they have actually outshot those opponents 56-to-41.

The real effect on form this game will likely be the fact that Brad Evans and Eddie Johnson are out due to international duty---technically, though in Johnson's case it's more his injury that's going to keep him out. The Timbers will be without Rodney Wallace, Ryan Johnson and Alvas Powell. However, the Timbers' Johnson is not nearly the potential threat as that of the Sounders, and Powell has been in a reserve role since the return of Jack Jewsbury and Futty Danso. The absences seem to favor the Timbers.

The stats suggest that Portland is a somewhat heavy favorite this match. Playing in front of Jeldwen's home fans, with better shot rates and finishing rates on the season, our model suggests the Timbers have a 52-percent probability of winning against the Sounders' 19 percent. This really shouldn't be surprising, as we've seen the strong predictive ability of both home-field advantage and shots rates. What might be slightly more surprising is the effect of this match on the Supporters' Shield race.

In the event of a Portland win, our season simulation suggests Portland would catapult to nearly a 45-percent chance at (at least a share of) the Shield, and Seattle would fall to below five percent (2.3%). However, if Seattle pulls off the upset, it would improve its chances to 26 percent, while simultaneously dropping Portland to virtually zero percent. A tie would actually be the combined worst-case scenario for the two teams. They currently share 38-percent chances at the overall top seed, but a tie would leave the Sounders at 12.4 percent and the Timbers with 0.3 percent, allowing New York (45.3%) and Sporting Kansas City (42.8%) to fight over the Shield.

These two teams are almost sure to make the playoffs, regardless of tonight's outcome. But a top-seed and the prospect of home-field advantage in a potential MLS Cup Final no doubt elicits some drooling from both sets of supporters.

Colorado's Playoff Chances.

A few short months ago, we recorded a podcast in which we discussed the teams likely to make the playoffs in the Western Conference. At the time, we did not think Colorado would get in, but now---after a surprising win in Los Angeles---the Rapids find themselves very much in the thick of the playoff race. As of May 31st, Colorado had earned 19 points from 13 games and sat in 5th place out west. Additionally, the Rapids' 1.03 attempts ratio was 6th in the conference, while our shot locations data suggested an expected goal differential essentially tied for 4th. Perhaps we shouldn't have discarded them so quickly.

Now with more information, we've seen how shot ratios help to predict the future as well as anything in soccer. The predictions aren't awesome, but better than if we were to use goal differential or standings. Colorado has found itself still in playoff contention, and I think it is worth revisiting the playoff chances for the Western Conference's mile-high team.

Now, 28 weeks into the season, Colorado has improved its shots ratios and expected goal differential to second in the conference, just behind the Galaxy on both accounts. But while Colorado could very well be one of the top teams in the West, its remaining schedule is pretty brutal. Of its last six matches, five of them come against Dallas, Portland, San Jose and Vancouver (twice). Those are the four teams fighting along with the Rapids for the final two playoff spots. The other game on their schedule just so happens to be Seattle, the current favorite to win the Supporters' Shield. There is not a single cupcake on the schedule, and losses will be far more costly than if they were against Eastern Conference foes.

While the best predictions using shots data still leaves much to be desired, that data would in fact pick Colorado as the second best team in the West. However, playing a tough schedule against opponents shooting for the same playoff spot, there is so much weight on just a few games. I'd pick Colorado to be one of those top five in the tables at the end, but it's not a gimme pick.

Let's say, ooooh, 55%.

The Chicago Fire and Goal Mouth Data

This is merely a trial run. I say that because in the last two days I've limited the collection of data and then expanded it. It comes down to how it tickles my fancy. The data I have collected is limited for the time being to the Chicago Fire as just a means of comparing a club and its data to the league and trying to make sense of it. This hopefully will develop into the means of how I can some how attribute value to clubs and their keepers in the future. Below is a picture of the goal mouth, and the data has been collected from the website Sqwuaka.com. Coupled with a previously built image, you can see how Chicago compares to the rest of the league and how a majority of their goals have been scored this season. While numbers are always an important thing, remember that it's more about ratios and the average occurrence than pure accumulation at this juncture. Not all teams have played the same amount of games and they haven't had the same opportunities.  Shots+Goals and visuals

ChicagoFireHUD

In addition to the Goal Mouth visual, here is a field map diagram as it applies to the dimension of the field. This has already been provided in raw form in the data that Matty has collected and posted in the raw shot data tab, but I wanted to have another visual to compare the above data.

ChicagoFire-ShotField

The problem between the two is that there is no correlation between the fact that Chicago has allowed 4 goals in section 5 to the fact that they've also allowed 5 goals in SoT1 (for ease of the tally, I gave a numerical designation to each location on the goal mouth; starting at the top and working left to right). This is the next collaborative effort that I'm working on, gathering both the shot location of origin and placement on the goal, and from what specific individual at what time.

This is a very time-intensive task and it'll probably take me the rest of the week to complete it just for Chicago. However, I'm taking suggestions on how I could compile this data without hand jamming it into a flat file. An SQL dump of the current Opta database for the season would go a long ways to helping compile this data and would be nice. But I'm never above a bit of hard work.

Thoughts?

A Visual Look At Shots On Target

This is part of my efforts to try to come up with a zone rating of sorts for goal keepers. The problem I'm running into at the moment is trying to find visual information for shots against. If I want to know how good Dan Kennedy was preventing goals against Columbus in week 1, I have to go to Columbus' page on Squawka and narrow the shot data to that specific game. Basically, It just boils down to more time digging than I initially planned to devote. Quickly, here is a visual graphic that I made with the help of Excel. I know it's not really pretty, but it delivers the data in the manner in which I needed it without getting caught up on eccentric details, details with which I often spend too much time meddling. Shots+Goals and visuals

There isn't a lot that this immediately tells you, of course. It's more of a jumping off point to start comparing data once it is collected. That's where the next effort is going to be headed. Who are the teams that are above league average and below league average? Are they bleeding low percentage goals, or are they being beat in an unusual zone? This information, while still miles from being complete, moves us in the right direction of knowing more about shots and goals than what we did previously.

You'll notice that I also included shots that are wide but still close to the post. I'm curious as to whether these shots numbers become inflated when playing teams with "better keepers". Unfortunately we need to define what better is. Better than what, exactly? I'm not sure. Again, parameters haven't been set, and data sets are still being gathered.

This is a fun exercise and one that should, if nothing else, provide us with some excellent insight to teams and their seasons at this point.

Comparing Goalkeepers to Pitchers

Cruising around twitter is about the most social I get nowadays. It sounds nerdy, and really it is, but it's amazing the amount of material that you can discover---not to mention the 140-character conversations you can have---produced by people smarter than me. Looking around, I stumbled across an article that dates back about 10 days from the site 'Bring On The Stats' by the anonymous author Chase H (aka @chaser_racer32 on twitter). Chase H, goes into a good post about how Sporting Kansas City's goal keeper, Jimmy Neilsen, is---probably gradually---headed for the decline. He comes to this conclusion by going through save% and shots against per minute. A pretty good tactic that has some good reasoning.

"The table above is sorted by save %, which is pretty self-explanatory; it’s the percentage of shots saved by the keeper. Nielsen has the third-worse save % of all goalkeepers with more than 1400 minutes played. The perfect example of why wins and shutouts are not the best measures for a goalkeeper is the fact that Chivas USA keeper Dan Kennedy has saved a higher percentage of shots than Nielsen, and yet has only recorded 2 shutouts, and the team only has 4 wins. Kennedy has the misfortune of playing for one of the worst teams in the MLS, and he has faced almost 50 more shots than Jimmy Nielsen.

On the flip side, one can argue that because the defense plays so well, generally only the most quality shots make it on goal from the opponent. I do acknowledge that is a very big issue to this study, but to compare Neilsen’s stats from last season with the same defense, we see he saved 74% of the shots he faced while the defense conceded almost exactly the same numbers of shots per minute he played."

I'm pretty sure I've seen the analogy of baseball pitchers compared to goal keepers before---if not from some random person or thing I read, then certainly from Matthias. The point of the comparison being that neither the goalkeeper nor the pitcher really has as much influence on goals allowed or runs scored against them as a lot of traditionalists and general fans believe.

In fact, baseball created an individual stat to track exactly what a pitcher controls, and Fangraphs grades him solely on that stat, "FIP." The stat has been well-documented and was introduced to the general public by writers much more skilled than myself.

Back in the early 2000s, research by Voros McCracken revealed that the amount of balls that fall in for hits against pitchers do not correlate well across seasons. In other words, pitchers have little control over balls in play. McCracken outlined a better way to assess a pitcher’s talent level by looking at results a pitcher can control: strikeouts, walks, hit by pitches, and homeruns.

Finding some reading material on FIP today, and thinking about our podcast about the possibility of whether keepers influence shots on target, sparked some thoughts following the article by Chase H.

The idea of keepers being analogous to pitchers is all well and good. There are certainly some similarities. The problem I'm starting to have, though, is that there may be a better way of looking at it. Pitchers, while minimally, still control aspects of their performance such as ground ball and fly ball rates, strikeouts and walks. Keepers potentially could influence opponents psychologically, but truly the only physical effect they have at their disposal, prior to the shot, is their positioning. Positioning frequently corresponds to the defensive placement of a keeper's teammates and the opposition that controls possession.

This isn't the quiet like-to-like thinking that most jump into. However, I started reading about another baseball statistic and it made me think...

One of the differences between UZR and linear weights is that with UZR, the amount of credit that the fielder receives on each play---positive (if he makes an out) or negative (if he allows a hit or an ROE)---depends on how often that particular kind of batted ball, in terms of its location, speed and several other factors, is fielded by an average fielder at the same position. With offensive linear weights, if a batted ball is a hit or an out, the credit that the batter receives is not dependent on where or how hard the ball was hit, or any other parameters.

Maybe, we (and by we, I mean me) are looking at keepers the wrong way. Just like assuming that keepers have control over wins, shutouts and the like, is it any more responsible to assume that goals scored against them are purely their fault either? I'm talking about save percentage here.

To test this Keeper UZR out, we need to create set of guidelines in the same manner as what has been set out for UZR. There is also the key dependency that we don't have 6 years worth of data to work from. We barely have3 years of chalkboard data, and if using WhoScored or Squawka, we have even less than that.

The other problem is that we don't know the speed of the shot, and getting the angle of the shot isn't necessarily easy either. Not that it's particularly important. My goal this week is to take the shot data by Squawka and put together a visual representation of the six prominent scoring locations complete with shots saved data associated.

goalsscoredagainstSSFC

The first thing we need to establish is what are the areas shots are saved the least and how good keepers are at limiting goals they should. This seems rather silly, as I'm sure we can probably already theorize the likely goal-scoring locales as being the outside marks near the post. However, we still need numbers and we still need to know how good teams are at preventing goals that they often should.

Controlling for difficulty of shot on target by location on the frame at least starts to give us an intelligent understanding of what goal keepers are doing right and what they are doing wrong.

Noisy Finishing Rates

As a supplement to the stabilization analysis I did last week, I wanted to add the self-predictive powers of finishing rates—basically soccer’s shooting percentage. Team finishing rates can be found both on our MLS Tables and in our Shot Locations analysis, so it would be nice to know if we can trust them. Last week I split the 2012 and 2013 seasons in half and assessed the simple linear relationships for various statistics between the two halves of each season across all 19 teams. Now I have 2011 data, and we can have even more fun. I included bivariate data from both 2011 and 2012 together, leaving out 2013 since it is not over yet. It is important to note that I am not looking across seasons, only within seasons. To the results!

Stat Correlation Pvalue
Points

0.438

0.7%

Total Attempts

0.397

1.5%

Blocked Shots

0.372

2.3%

Shots on Goal

0.297

7.4%

Goals

0.261

11.9%

Shots off Goal

0.144

39.5%

Finishing

0.109

52.1%

Surprisingly, to me at least, a team’s points earned has been the most stable statistic in MLS (by my linear definition of stability). Not so surprising to me was that total attempts is also one of the most stable. Look down at the very bottom, and you’ll find finishing rates. Check out the graph below:

 Finishing Rates Stabilization 2011-2012

Some teams finish really well early in the season, then flop. Others finish poorly, then turn it on. But there’s no obvious to pattern that would allow us to predict second-half finishing rates. In fact, the best prediction for any given team would be to suggest that they will regress to league average, which is exactly what our Luck Table does. It regresses all teams’ finishing rates in each zone back to league averages, then calculates an expected goal differential.

On a side note, you might be asking yourself why I don't just use points to predict points. Because this: while the correlation between first-half and second-half points is about 0.438, the correlation between first-half attempts ratios and second-half points is slightly stronger at 0.480. Also, in a multiple regression model where I let both first-half attempts ratio and first-half points duke it out, first-half attempts ratio edges out points for winner of the predictor trophy.

Estimate Std. Error T-stat P-value
Intercept 1.7019 5.97 0.285 77.7%
AttRatio 13.7067 6.32 2.17 3.7%
Points 0.3262 0.19 1.691 10.0%

And since this is a post about finishing rates...

Estimate Std. Error T-stat P-value
Intercept -2.243 7.75 -0.29 77.4%
AttRatio 18.570 5.71 3.26 0.3%
Finishing% 63.743 50.08 1.27 21.2%

A good prediction model (on which we are working) will include more than just a team's attempts ratio, but for now, it is king of the team statistics.

Signal and Noise in MLS

Some Nate Silver guy wrote a whole book about "signal" and "noise" in data, so it must be important, right? Sports produce a lot of statistics, and it turns out that some of those statistics are pretty meaningless---that is, pretty noisy. A pitcher's ERA is sitting below 3.00 after eight starts, but he has more walks than strikeouts. Baseball sabermetricians will tell you that the low ERA is mostly noise, but that the high walk rate is a signal for impending doom. An MLS team leads the league in points per match, but it allows more shots than it earns for itself (note: this team is called "Montreal Impact"). Soccer nerds like me will tell you that its position in the standings is mostly noise, and that its low shots ratio is a signal for impending doom---or something worse than first place, anyway.

The reasoning behind both examples above is basically the same. Pitchers' ERAs, like soccer teams' points earned, are highly variable and unpredictable, while strikeout-to-walk ratios and shots ratios are more consistent. It's better to put your money on something consistent and easy to predict, rather than something variable and hard to predict. Duh, right?

So here's why we like shots data 'round these parts. Below I have provided two charts of MLS data, one from 2012 and one from 2013. I split each season into two parts and then measured the linear predictive power of each stat on itself. Did teams that scored lots of goals early in the season also score lots of goals later in the season? That's the kind of question answered here.

2012 MLS Stat R2 Pvalue 2013 MLS Stat R2 Pvalue
Blocked Shots 37.1% 0.6% Shots off Goal 34.8% 0.8%
Total Attempts 26.1% 2.5% Total Attempts 34.5% 0.8%
Goals 20.3% 5.3% Shots on Goal 29.4% 1.7%
Points 20.1% 5.5% Points 4.1% 40.7%
Shots on Goal 18.2% 6.9% Blocked Shots 1.7% 60.0%
Shots off Goal 3.6% 43.7% Goals 1.5% 61.6%

As an example of what this means, let's consider the attempts stat. Remember that an attempt is any effort in the direction of the goal, so basically an attempt is any shot---on target, off target, or blocked. In each of the past two seasons, MLS teams' attempts totals in the first half of the season were able to help predict their attempts totals in the second half, explaining 26.1% and 34.5% of the variability in second-half attempts, respectively. Those might not seem like high percentages of explanation, but the MLS season is short, and statistically significant predictors are hard to find.

In baseball, such "self-predictors" have been referred to as "stabilization." Stabilization is important because, as mentioned above, stabilization means that a stat is consistent, and that a team is likely to replicate its results in the future. This MLS season, points earned during the first 10 matches were essentially worthless at predicting points earned in the second 10 games. Even over the 34 games each team played in 2012, the stabilization for points earned was not as strong as that of attempts or goals scored.*

The next step is figuring out what predicts future points earned, since it does a pretty lame job of predicting itself. But I'll leave that for another post after I have gathered data going back a few more seasons. The number one takeaway here is that some stats can only tell us what happened, but not what will happen. There is another group of stats that are doubly important because they also stabilize---predicting themselves using smaller sample sizes. Those stabilizing stats (like shot attempts) are the signal amid the sea of noise known most places as "football."

Seattle has only played 21 games, so I cannot do 11-and-11 splits, yet.  Also, as for why shots off goal and blocked shots have essentially switched places, I would wager that's more due to how they are (somewhat) subjectively categorized, but who knows. 

Game of the Week: Montreal Impact at Chicago Fire

So why this game, you ask. Real Salt Lake is hosting Houston, and the Revs travel to play the Wiz, but I picked this game instead. Despite a negative goal differential, I like Chicago in this one. I smell upset. Our MLS tables tell me that Chicago ranks 5th in the league in attempts ratio at 1.07, earning nearly 10% more shot attempts than its opponents on average. When we account for where those shots are coming from, our shot location data suggests that Chicago's goal differential should be pretty even: -0.04 expected goal differential (xGD) per game if I regress finishing rates 100%. Basically Chicago is an average team with a little bad own-goals luck. However, as I've been preaching all year, Montreal's play is seemingly unsustainable, and it is playing on the road. Despite the most points per game in MLS, Montreal owns the third-worst attempts ratio in the league, and an expected goal differential of -0.20 goals per game. The Impact may very well be the second-best team on the pitch come Saturday.

Chicago Fire Shots Data

For Locations Goals GoalDistr SOGDistr OffDistr BlksDistr AttDistr Finish% ExpGoals
One 5 19.2% 6.9% 2.6% 2.9% 4.2% 41.7% 4.0
Two 16 61.5% 31.7% 31.3% 17.6% 28.2% 20.0% 14.3
Three 3 11.5% 19.8% 16.5% 20.6% 18.7% 5.7% 3.2
Four 1 3.8% 22.8% 19.1% 25.0% 21.8% 1.6% 2.8
Five 1 3.8% 18.8% 28.7% 33.8% 26.4% 1.3% 1.6
Six 0 0.0% 0.0% 1.7% 0.0% 0.7% 0.0% 0.1
Total 26 26.0
Against Locations Goals GoalDistr SOGDistr OffDistr BlksDistr AttDistr Finish% ExpGoals
One 8 29.6% 11.5% 9.4% 2.0% 8.6% 34.8% 7.7
Two 9 33.3% 26.4% 29.7% 15.7% 25.9% 13.0% 12.3
Three 6 22.2% 23.0% 14.8% 31.4% 20.7% 10.9% 3.3
Four 0 0.0% 13.8% 13.3% 21.6% 15.0% 0.0% 1.8
Five 3 11.1% 21.8% 32.0% 29.4% 28.2% 4.0% 1.6
Six 1 3.7% 3.4% 0.8% 0.0% 1.5% 25.0% 0.2
Total 27 26.9
Luck -0.1

Montreal Impact Shots Data

For Locations Goals GoalDistr SOGDistr OffDistr BlksDistr AttDistr Finish% ExpGoals
One 4 12.1% 5.6% 1.1% 1.5% 3.0% 50.0% 2.7
Two 18 54.5% 30.6% 33.3% 25.4% 30.2% 22.5% 14.3
Three 5 15.2% 25.9% 18.9% 10.4% 19.6% 9.6% 3.1
Four 4 12.1% 18.5% 15.6% 23.9% 18.9% 8.0% 2.3
Five 2 6.1% 19.4% 31.1% 38.8% 28.3% 2.7% 1.6
Six 0 0.0% 0.0% 0.0% 0.0% 0.0% 0.0% 0.0
Total 33 23.9
Against Locations Goals GoalDistr SOGDistr OffDistr BlksDistr AttDistr Finish% ExpGoals
One 4 12.9% 5.9% 2.8% 2.9% 3.8% 33.3% 4.0
Two 15 48.4% 39.6% 32.4% 10.3% 29.9% 16.0% 16.7
Three 5 16.1% 14.9% 9.0% 11.8% 11.5% 13.9% 2.2
Four 6 19.4% 19.8% 13.8% 26.5% 18.5% 10.3% 2.7
Five 0 0.0% 18.8% 39.3% 45.6% 34.1% 0.0% 2.2
Six 1 3.2% 1.0% 2.8% 2.9% 2.2% 14.3% 0.3
Total 31 28.1
Luck 6.2

MLS, Home Field Advantage And Success Rates

This week on the podcast we talked a bit about home field advantage and the fact that it undeniably exists. But while it does exist, the question then becomes to what extent does it exist? What teams have taken advantage of it over the past few years? I went back as far as 2008 to collect some data. I'm not sure what this data all means, but I feel that it, at the very least, gives us a bit of commentary on those teams that had success and failure on their home pitches. I broke clubs into two groups: teams that scored more than 40 points and teams that scored less than 40 points. Then I compiled some key home stats for the two groups.

A quick key for the stats used:

%home won, is the percentage of possible games at home at the team won. Duh, right?

%home points is the percentage of its home games in which the team nabbed any points (draw or win).

%total of points is how many points from home games contributed to their total points accrued over the season.

Then GD is just goal differential.

Team Above 40 Points

2008
POS TEAM GP W D L F A Home Points Total Points %home won %home points %total of points Home- GD GD
1 Houston 15 10 4 1 30 14 34 51 66.67% 93.33% 66.67% 16 13
2 Chivas USA 15 7 4 4 21 15 25 43 46.67% 73.33% 58.14% 6 -1
3 Real Salt Lake 15 8 6 1 24 10 30 40 53.33% 93.33% 75.00% 14 1
POS TEAM GP W D L F A Home Points Total Points %home won %home points %total of points Home- GD GD
1 Columbus 15 11 2 2 30 15 35 57 73.33% 86.67% 61.40% 15 14
2 Chicago 15 7 3 5 23 17 24 46 46.67% 66.67% 52.17% 6 11
3 New England 15 6 4 5 24 20 22 43 40.00% 66.67% 51.16% 4 -3
4 Kansas City 15 9 4 2 22 15 31 42 60.00% 86.67% 73.81% 7 -2

Team Below 40 Points

2008
POS TEAM GP W D L F A Home Points Total Points %home won %home points %total of points Home- GD GD
5 New York Red Bulls 15 9 3 3 30 20 30 39 60.00% 80.00% 76.92% 10 -6
6 DC United 15 9 2 4 32 19 29 37 60.00% 73.33% 78.38% 13 -8
7 Toronto FC 15 6 7 2 17 12 25 35 40.00% 86.67% 71.43% 5 -9
POS TEAM GP W D L F A Home Points Total Points %home won %home points %total of points Home- GD GD
4 Colorado 15 7 3 5 22 14 24 38 46.67% 66.67% 63.16% 8 -1
5 FC Dallas 15 5 6 4 23 19 21 36 33.33% 73.33% 58.33% 4 4
6 San Jose 15 6 4 5 22 19 22 33 40.00% 66.67% 66.67% 3 -6
7 Los Angeles 15 6 5 4 35 27 23 33 40.00% 73.33% 69.70% 8 -7

Team Above 40 Points

2009
POS TEAM GP W D L F A Home Points Total Points %home won %home points %total of points Home- GD GD
1 Columbus 15 9 4 2 25 15 31 49 60.00% 86.67% 63.27% 10 10
2 Chicago 15 5 6 4 16 17 21 45 33.33% 73.33% 46.67% -1 5
3 New England 15 7 4 4 22 16 25 42 46.67% 73.33% 59.52% 6 -4
4 DC United 15 7 5 3 19 14 26 40 46.67% 80.00% 65.00% 5 -1
POS TEAM W D L F A Home Points Total Points %home won %home points %total of points Home- GD GD
1 Houston 15 8 6 1 23 13 30 48 53.33% 93.33% 62.50% 10 10
2 Los Angeles 15 7 4 4 18 17 25 48 46.67% 73.33% 52.08% 1 5
3 Seattle Sounders FC 15 7 6 2 21 10 27 47 46.67% 86.67% 57.45% 11 9
4 Chivas USA 15 9 3 3 25 14 30 45 60.00% 80.00% 66.67% 11 3
5 Real Salt Lake 15 9 5 1 34 11 32 40 60.00% 93.33% 80.00% 23 8
6 Colorado 15 8 5 2 25 10 29 40 53.33% 86.67% 72.50% 15 4

Team Below 40 Points

2009
POS TEAM GP W D L F A Home Points Total Points %home won %home points %total of points Home- GD GD
5 Toronto FC 15 8 4 3 20 14 28 39 53.33% 80.00% 71.79% 6 -9
6 Kansas City 15 4 5 6 18 20 17 33 26.67% 60.00% 51.52% -2 -9
7 New York Red Bulls 15 5 4 6 24 20 19 21 33.33% 60.00% 90.48% 4 -20
POS TEAM GP W D L F A Home Points Total Points %home won %home points %total of points Home- GD GD
7 FC Dallas 15 8 4 3 28 19 28 39 53.33% 80.00% 71.79% 9 3
8 San Jose 15 6 4 5 22 21 22 30 40.00% 66.67% 73.33% 1 -14

Team Above 40 Points

2010
POS TEAM GP W D L F A Home Points Total Points %home won %home points %total of points Home- GD GD
1 New York Red Bulls 15 10 2 3 18 9 32 51 66.67% 80.00% 62.75% 9 9
2 Columbus 15 10 2 3 22 12 32 50 66.67% 80.00% 64.00% 10 6
POS TEAM GP W D L F A Home Points Pts %home won %home points %total of points Home- GD GD
1 Los Angeles 15 9 2 4 27 19 29 59 60.00% 73.33% 49.15% 8 18
2 Real Salt Lake 15 11 4 0 31 7 37 56 73.33% 100.00% 66.07% 24 25
3 FC Dallas 15 8 6 1 25 13 30 50 53.33% 93.33% 60.00% 12 14
4 Seattle Sounders FC 15 8 3 4 21 16 27 48 53.33% 73.33% 56.25% 5 4
5 Colorado 15 8 5 2 26 11 29 46 53.33% 86.67% 63.04% 15 12
6 San Jose 15 7 3 5 17 14 24 46 46.67% 66.67% 52.17% 3 1

Team Below 40 Points

2010
POS TEAM GP W D L F A Home Points Total Points %home won %home points %total of points Home- GD GD
3 Kansas City 15 7 3 5 25 19 24 39 46.67% 66.67% 61.54% 6 1
4 Chicago 15 4 7 4 14 13 19 36 26.67% 73.33% 52.78% 1 -1
5 Toronto FC 15 6 6 3 19 15 24 35 40.00% 80.00% 68.57% 4 -8
6 New England 15 7 3 5 21 18 24 32 46.67% 66.67% 75.00% 3 -18
7 Philadelphia Union 15 6 6 3 22 16 24 31 40.00% 80.00% 77.42% 6 -14
8 DC United 15 3 1 11 12 25 10 22 20.00% 26.67% 45.45% -13 -26
POS TEAM GP W D L F A Home Points Total Points %home won %home points %total of points Home- GD GD
7 Houston 15 6 3 6 25 21 21 33 40.00% 60.00% 63.64% 4 -9
8 Chivas USA 15 6 2 7 19 19 20 28 40.00% 53.33% 71.43% 0 -14

Team Above 40 Points

2011
POS TEAM GP W D L F A Home Points Total Points %home won %home points %total of points Home- GD GD
1 Sporting Kansas City 17 9 6 2 29 51 52.94% 88.24% 56.86% 10
2 Houston Dynamo 17 10 4 3 33 49 58.82% 82.35% 67.35% 4
3 Philadelphia Union 17 7 9 1 22 48 41.18% 94.12% 45.83% 8
4 Columbus Crew 17 9 5 3 30 47 52.94% 82.35% 63.83% -1
5 New York Red Bulls 17 8 6 3 27 46 47.06% 82.35% 58.70% 6
6 Chicago Fire 17 6 8 3 21 43 35.29% 82.35% 48.84% 1
POS TEAM GP W D L F A Home Points Total Points %home won %home points %total of points Home- GD GD
1 LA Galaxy 17 12 5 0 36 67 70.59% 100.00% 53.73% 20
2 Seattle Sounders FC 17 9 4 4 31 63 52.94% 76.47% 49.21% 19
3 Real Salt Lake 17 10 4 3 33 53 58.82% 82.35% 62.26% 8
4 FC Dallas 17 9 3 5 32 52 52.94% 70.59% 61.54% 3
5 Colorado Rapids 17 6 9 2 20 49 35.29% 88.24% 40.82% 3
6 Portland Timbers 17 9 3 5 32 42 52.94% 70.59% 76.19% -8

Team Below 40 Points

2011
POS TEAM GP W D L F A Home Points Total Points %home won %home points %total of points Home- GD GD
7 DC United 17 4 8 5 20 39 23.53% 70.59% 51.28%
8 Toronto FC 17 5 8 4 23 33 29.41% 76.47% 69.70%
9 New England 17 4 6 7 18 28 23.53% 58.82% 64.29%
POS TEAM GP W D L F A Home Points Total Points %home won %home points %total of points Home- GD GD
7 San Jose 17 5 8 4 23 38 29.41% 76.47% 60.53%
8 Chivas USA 17 5 5 7 20 36 29.41% 58.82% 55.56%
9 Vancouver Whitecaps 17 6 5 6 23 28 35.29% 64.71% 82.14%

Team Above 40 Points

2012
POS TEAM GP W D L F A Home Points Total Points %home won %home points %total of points Home- GD GD
1 Kansas City 17 10 4 3 22 12 33 63 58.82% 82.35% 52.38% 10 15
2 DC United 17 12 4 1 37 17 37 58 70.59% 94.12% 63.79% 20 10
3 New York Red Bulls 17 11 4 2 34 18 35 57 64.71% 88.24% 61.40% 16 11
4 Chicago 17 11 3 3 27 18 36 57 64.71% 82.35% 63.16% 9 5
5 Houston 17 11 6 0 31 12 33 53 64.71% 100.00% 62.26% 19 7
6 Columbus 17 11 3 3 28 21 36 52 64.71% 82.35% 69.23% 7 0
7 Montreal Impact 17 10 3 4 31 19 34 42 58.82% 76.47% 80.95% 12 -6
POS TEAM P W D L F A Home Points Total Points %home won %home points %total of points Home- GD GD
1 San Jose 17 10 6 1 43 22 31 66 58.82% 94.12% 46.97% 21 29
2 Real Salt Lake 17 11 2 4 27 15 37 57 64.71% 76.47% 64.91% 12 11
3 Seattle Sounders FC 17 11 2 4 27 11 37 56 64.71% 76.47% 66.07% 16 18
4 Los Angeles 17 10 1 6 31 20 36 54 58.82% 64.71% 66.67% 11 12
5 Vancouver Whitecaps 17 8 6 3 25 17 27 43 47.06% 82.35% 62.79% 8 -6

Team Below 40 Points

2012
POS TEAM GP W D L F A Home Points Total Points %home won %home points %total of points Home- GD GD
8 Philadelphia Union 17 7 2 8 22 20 23 36 41.18% 52.94% 63.89% 2 -8
9 New England 17 7 6 4 23 15 27 35 41.18% 76.47% 77.14% 8 -5
10 Toronto FC 17 3 5 9 15 25 14 23 17.65% 47.06% 60.87% -10 -26
POS TEAM GP W D L F A Home Points Total Points %home won %home points %total of points Home- GD GD
6 FC Dallas 34 6 8 3 21 16 26 39 17.65% 41.18% 66.67% 5 -5
7 Colorado 34 8 3 6 29 19 27 37 23.53% 32.35% 72.97% 10 -6
8 Portland Timbers 34 7 6 4 24 21 27 34 20.59% 38.24% 79.41% 3 -22
9 Chivas USA 34 3 3 11 9 30 12 30 8.82% 17.65% 40.00% -21 -34

2008-2012 average, Teams above 40 points

avg HP avg TP avg %HW avg %HP avg %of TP
25 41 46.10% 68.59% 50.56%

2008-2012 average, Teams below 40 points

avg HP avg TP avg %HW avg %HP avg %of TP
24 35 37.67% 67.26% 71.09%

There are a few things that I think you could really take from this, starting with A) there is so much parity in this league that it doesn't matter if you are a good team or a bad, a game at home should give you a lot of confidence. B) Good teams, play-off teams, teams who want a chance at the Supporters Shield ... they win road games. The teams above 40 points and teams below 40 points are only separated by their percentage of total points won at home. Meaning that teams above 40 points only took 50% of their teams total points from home games; they won on the road, too. Where as teams that struggled depended on those home games to make them suck less.

Good teams win at home, but so do bad teams... unless you are DC United. Sorry, Drew.

Field Dimension, Turf and Home Field Advantage in MLS

During last weekend’s podcast, we discussed home field advantage and where it might come from. There is much literature to suggest that home field advantage comes largely from rowdy home crowds—crowds that both encourage the home team to be more aggressive and encourage the referees to be more biased—but you probably already presumed that. We went on to talk about “home specialists,” or teams that play especially well at home in a given season. An article on the site The Power of Goals theoretically explains why home specialists from any single season tend to be products of statistical noise rather than signal. That’s not to say there aren’t home specialists out there, only that it is nearly impossible to identify them statistically in a single season.

Picking out the teams that have performed markedly better at home, and then retroactively seeking explanations to match the traits of those teams is known as cherry picking, and it’s likely to lead to false conclusions (On the podcast, I recounted an example from the book Naked Statistics by Charles Wheelan as to why this can lead to trouble). Instead, identifying traits of teams and stadia first, and then checking for measurable differences in home performance based on those traits is a more sound approach.

We have mentioned around here before that Houston’s narrow home pitch might have helped the Dynamo to one of the best home records since BBVA Compass Stadium was built in preparation for the 2012 season. Indeed, Houston’s field is the narrowest in the league at 70 yards, and the Dynamo’s home goal differential is a whopping 1.33 goals better at home than on the road. However, the only reason we considered field dimensions was because Houston has performed so well at home.

We went like this:

Extreme split for Houston --> Field Dimensions must matter

But we should have thought like this:

Field Dimensions --> Extreme splits?

To advance the discussion, I gathered data going back to the 2010 season in order to look for explanatory patterns in two observable variables of stadia: field dimensions and surface. If teams are able to train on especially large or especially small fields, or on turf, such differences in the pitches may give home teams a leg up in matches played on those familiar pitches.

It turns out there is not enough evidence that either turf surfaces or field dimensions have much to do with home success.

Surface vs. Home success

There are currently four teams that play on turf: Portland, Seattle, Vancouver and New England. While the Timbers and Whitecaps have dominated at home, The Sounders and Revs have been sub par relative to the league in that department. Considering I didn’t account for the confounding variable that two of these teams play in front of some of the rowdiest fans in MLS, the “turf effect” may not even exist at all. It’s hard to say with only four teams playing on turf, three of which are not even in their adolescence as franchises.

Width vs. Home success

Field dimensions showed minimal effects, as well. Though Houston’s small, 70-by-115-yard pitch has correlated with its home success, that correlation is not true of other small stadia. The next-smallest stadium can be found in Washington D.C.,** but DCU has actually performed a little worse at home relative to the typical league splits. Montreal has the widest pitch at 77 yards, and yet, also has performed well. There is a chance that teams with extreme widths—extremely narrow or extremely wide—have some sort of advantage, but we’re going to have to wait for additional data from Houston and Montreal to be more definitive about that.

The vast majority of MLS pitches, 16-of-19 in fact, are either 74 or 75 yards wide. So even the two extremes in Houston and Montreal are not all that different. Houston could be a team built to play on a narrow pitch, but I’m skeptical that A) Soccer Analytics have come far enough for a general manager to sort that out and B) that 4-5 yards would make such a big difference.

Though I can’t say for sure that the pitch effects are non-existent, I can say pretty confidently that they aren’t pronounced or noticeable in a single season. Right now, I would argue it’s more likely that Montreal and Houston have performed so well at home due to the random variation of only two seasons of data. We will have to wait another few seasons to check on that one.

*Vancouver plays on Astroturf while the other three play on Field Turf.

**DCU’s field at RFK is a little wider at 72 yards, and actually a little shorter at 110 yards.