As a supplement to the stabilization analysis I did last week, I wanted to add the self-predictive powers of finishing rates—basically soccer’s shooting percentage. Team finishing rates can be found both on our MLS Tables and in our Shot Locations analysis, so it would be nice to know if we can trust them.
Last week I split the 2012 and 2013 seasons in half and assessed the simple linear relationships for various statistics between the two halves of each season across all 19 teams. Now I have 2011 data, and we can have even more fun. I included bivariate data from both 2011 and 2012 together, leaving out 2013 since it is not over yet. It is important to note that I am not looking across seasons, only within seasons. To the results!
Stat |
Correlation |
Pvalue |
Points |
0.438
|
0.7%
|
Total Attempts |
0.397
|
1.5%
|
Blocked Shots |
0.372
|
2.3%
|
Shots on Goal |
0.297
|
7.4%
|
Goals |
0.261
|
11.9%
|
Shots off Goal |
0.144
|
39.5%
|
Finishing |
0.109
|
52.1%
|
Surprisingly, to me at least, a team’s points earned has been the most stable statistic in MLS (by my linear definition of stability). Not so surprising to me was that total attempts is also one of the most stable. Look down at the very bottom, and you’ll find finishing rates. Check out the graph below:
Some teams finish really well early in the season, then flop. Others finish poorly, then turn it on. But there’s no obvious to pattern that would allow us to predict second-half finishing rates. In fact, the best prediction for any given team would be to suggest that they will regress to league average, which is exactly what our Luck Table does. It regresses all teams’ finishing rates in each zone back to league averages, then calculates an expected goal differential.
On a side note, you might be asking yourself why I don't just use points to predict points. Because this: while the correlation between first-half and second-half points is about 0.438, the correlation between first-half attempts ratios and second-half points is slightly stronger at 0.480. Also, in a multiple regression model where I let both first-half attempts ratio and first-half points duke it out, first-half attempts ratio edges out points for winner of the predictor trophy.
|
Estimate |
Std. Error |
T-stat |
P-value |
Intercept |
1.7019 |
5.97 |
0.285 |
77.7% |
AttRatio |
13.7067 |
6.32 |
2.17 |
3.7% |
Points |
0.3262 |
0.19 |
1.691 |
10.0% |
And since this is a post about finishing rates...
|
Estimate |
Std. Error |
T-stat |
P-value |
Intercept |
-2.243 |
7.75 |
-0.29 |
77.4% |
AttRatio |
18.570 |
5.71 |
3.26 |
0.3% |
Finishing% |
63.743 |
50.08 |
1.27 |
21.2% |
A good prediction model (on which we are working) will include more than just a team's attempts ratio, but for now, it is king of the team statistics.