Using k-means to learn what soccer passing tells us about playing styles
/By Cheuk Hei Ho (@tacticsplatform) and Eliot McKinley (@etmckinley), colloquially known as CheuKinley
When you talk about a soccer team, you almost always talk about its style: high-pressing, possession-heavy, parking-the-bus, etc. A team’s style not only signifies how they play on the field but also reflects its coaching. Since there aren't guidelines on how the style of the team should be defined, everyone uses their own rules and we can't directly compare each other's descriptions.
An accurate quantitative description of the style is needed. It can help one to properly analyze not only the opponent's team but also his/her own team. With an accurate method to describe the style, one can scientifically evaluate if a training exercise is efficient at serving its purpose. We previously have used dimension reduction technique, t-SNE, to find MLS teams with similar styles based on the spatial distribution of activities and pass networks. This time we use a different method, k-means clustering of pass types, to quantitatively measure the style, tactical specialization, and the influence of coaching on a team’s system.
K-means clustering of passes
We used k-means clustering of pass types to quantify the styles of the teams in MLS. K-means clustering is a machine learning algorithm that separates data points into a user selected (k) number of clusters based upon their similarities. If you think that two clusters define the groups you want, you will choose k=2. If you think it is 10, choose 10. In our case, after using the elbow method and visual inspection, we chose to classify passes into 64 different groups based upon how and where passes were made. We want to note that using k-means clustering has been used many other times to describe passing behavior in soccer (and we used it, in part, to classify player positions). We extended previous work by using z-scores to standardize the quantification of each pass group. Then by filtering pass clusters based on z-scores we can find characteristic pass patterns for every team.
This visualization combines the features of both pass network and touch heatmap. It shows what areas a team utilizes the most and how (what type of passes) it uses to access this zone. For example, last season, Atlanta used long horizontal passes to stretch the opponent while Kansas City camped outside the opponent’s box with its possession dominance. By plotting distinctive pass types this way, we can also see how a team evolves under a coach. For instance, Tata Martino had clearly instructed how Atlanta played out from the back, however, it was a work-in-progress in the first year. They got the build-up part right but had trouble transitioning into the attack. With another full season to practice, they exploded into one of the best offensive teams in MLS history in their second season.
By varying the z-score to filter the data, you can look at the under-presented pass types and choose the degree of representation. In 2018, Columbus did not utilize long passes out of the back often, LAFC was less likely to cross from the flanks, and Portland didn’t pass from central locations back towards their own goal.
Tactical specialization
Using z-scores not only gives us a standardized score to evaluate the degree of representation of each pass cluster but also a quantitative measure of a team’s tactical specialization. Each z-score measures how much different a team is in using one type of passes compared to everyone else. If we take the median of the absolute value of the z-scores (since because both over- and under-representation equate to specialization, thanks for the idea, Dummy Run) per team, we approximate how much different a team is to everyone else.
Specialization does not necessarily mean a team is good or bad. There is only a weak, but significant, correlation between specialization and expected goal difference (R = 0.24, p = 0.007). In fact, two of the most specialized teams (>99th percentile) in the last seven years are New York City FC in 2016 and Colorado in 2018. Their most over-represented pass types are those that couldn’t get across the half-line. They are basically specialized in not passing forward. A non-ideal method of winning games, to say the least. The full table of specialization scores is at the bottom of this post.
The specialization scoring confirms some eye tests while refutes the others. For example, New York Red Bulls are believed to be the most distinctive franchise in MLS. The top five most distinctive teams from the last seven years include three Red Bulls, all under the supervision of Jesse Marsch (and Chris Armas last year). In contrast, many pundits believe that Columbus Crew under Gregg Berhalter played with a very unique style. However, their specialization scores suggest that they have been less specialized than most teams in the last four seasons. These are good examples of how an objective measure of style can help judge whether our subjective opinion stands.
Coaching influences tactical systems
The specialization score only tells us whether a team is different from everyone else, but it doesn’t tell us whether two teams are similar or not. Two teams can have very similar specialization scores but they can be specialized in different ways. Quantifying the way two teams play can tell us how coaching change or player turnover can impact the play style of the team.
To quantify the similarity of the play styles of the same team in two consecutive seasons, we calculate the Euclidean distance of the z-score for each cluster between seasons. We then do another z-score to standardize the resultant score and calculate a percentile to determine how the change between two years are compared to every other transition in the last seven seasons (note: above 50 is greater than average difference, below 50 less than average difference):
A coaching change seems to be the strongest driver in the evolution of the play style; even though the New York Red Bulls are the most distinctive franchise in the MLS, their style has been consistent under Marsch since 2016. Large differences in similarities were seen in Columbus when Gregg Berhalter took over for Robert Warzycha (2013-2014), NYCFC transitioning from Jason Kreis to Patrick Vieira (2015-2016), and New England in Brad Friedel's first season (2017-2018) after years of below average change under Jay Heaps. However coaching changes don’t always bring change, Portland, San Jose, and LA Galaxy showed less than average change when moving to new coaches. Interestingly, since 2015, SKC has shown increased year-over-year differences under Peter Vermes. While Ben Olsen and Pablo Mastroeni showed wild swings year-to-year during their respective tenures at DC United and Colorado.
Conclusion
Our next steps will be to link our quantitative measurement of the style to some forms of performance index. For example, some teams may predominantly use a pass type, but at a low success rate. In that case, a coach may want to decide how important that cluster is for the team’s function. He or she may want to introduce a new training regimen to improve the performance of that pass type, use different players in those positions, or even alter the pass routes to bypass it. We can look at the outcome of the style by linking pass clustering with the pass chain concept and rate them with Expected Goal Chain. This way, we can find all groups of passes that produce the most damage for any team. Imagine three linked forward pass clusters in which the middle cluster is under-represented and sandwiched by two over-represented ones. Immediately you will know that the under-represented cluster is the weakest link; your team may use other actions such as dribbles or carries to move the ball through that area. The coach may want to instruct his/her players to pass more than they are doing. The opponent’s coach may want to hit that area or player.
Applications like these are the tip of the iceberg in how this type of analysis can help coaching. Things like this can provide “actionable insights”, the holy grail of the soccer analytics.
Below: Over- and under-represented pass clusters for every team in each MLS season since 2013.
Team | Specialization Score | Rank | |
---|---|---|---|
Chicago | 1.47 | 10 | |
Colorado | 0.08 | 48 | |
Columbus | 1.41 | 12 | |
DC United | -0.99 | 107 | |
FC Dallas | -0.35 | 71 | |
Houston | -0.99 | 108 | |
Kansas City | -0.77 | 98 | |
L.A. Galaxy | 0.43 | 34 | |
Montreal | 1.54 | 9 | |
New England | 0.57 | 29 | |
New York | 0.27 | 40 | |
Philadelphia | -0.06 | 53 | |
Portland | -0.35 | 70 | |
Salt Lake | -0.74 | 96 | |
San Jose | 0.53 | 31 | |
Seattle | -0.22 | 65 | |
Toronto | -0.30 | 68 | |
Vancouver | 0.22 | 44 | |
Chivas | 1.04 | 17 | |
Chicago | -0.70 | 92 | |
Colorado | -0.77 | 99 | |
Columbus | -0.11 | 59 | |
DC United | -0.46 | 80 | |
FC Dallas | -0.71 | 93 | |
Houston | -0.46 | 79 | |
Kansas City | -1.18 | 112 | |
L.A. Galaxy | 1.78 | 7 | |
Montreal | -0.40 | 76 | |
New England | 1.67 | 8 | |
New York | 0.26 | 41 | |
Philadelphia | 1.13 | 16 | |
Portland | -0.57 | 84 | |
Salt Lake | -1.41 | 121 | |
San Jose | 0.05 | 49 | |
Seattle | 0.39 | 36 | |
Toronto | -0.68 | 91 | |
Vancouver | -0.93 | 104 | |
Chivas | -0.72 | 94 | |
Chicago | -1.28 | 116 | |
Colorado | -1.14 | 109 | |
Columbus | 2.17 | 6 | |
DC United | 0.12 | 47 | |
FC Dallas | -0.53 | 83 | |
Houston | -0.10 | 57 | |
Kansas City | -1.43 | 123 | |
L.A. Galaxy | -1.20 | 113 | |
Montreal | -0.61 | 87 | |
New England | 1.19 | 15 | |
New York | 0.39 | 35 | |
New York City FC | -0.39 | 75 | |
Orlando City | -0.33 | 69 | |
Philadelphia | -0.07 | 54 | |
Portland | 0.63 | 27 | |
Salt Lake | -0.46 | 81 | |
San Jose | -0.50 | 82 | |
Seattle | 0.74 | 23 | |
Toronto | 0.29 | 37 | |
Vancouver | -0.08 | 55 | |
Chicago | 0.65 | 26 | |
Colorado | -1.17 | 111 | |
Columbus | -0.20 | 64 | |
DC United | 0.29 | 38 | |
FC Dallas | -0.58 | 85 | |
Houston | 0.22 | 45 | |
Kansas City | -0.72 | 95 | |
L.A. Galaxy | -1.35 | 120 | |
Montreal | -1.15 | 110 | |
New England | 1.36 | 13 | |
New York | 2.48 | 4 | |
New York City FC | 3.14 | 2 | |
Orlando City | -0.61 | 86 | |
Philadelphia | -0.16 | 60 | |
Portland | -0.42 | 77 | |
Salt Lake | 0.17 | 46 | |
San Jose | -0.80 | 100 | |
Seattle | -0.81 | 102 | |
Toronto | -1.24 | 114 | |
Vancouver | -0.09 | 56 | |
Chicago | 0.46 | 33 | |
Colorado | 0.77 | 22 | |
Columbus | -0.61 | 88 | |
DC United | -0.95 | 105 | |
FC Dallas | -0.38 | 72 | |
Houston | 0.69 | 24 | |
Kansas City | -0.10 | 58 | |
L.A. Galaxy | -1.33 | 119 | |
Montreal | 0.57 | 30 | |
New England | -0.85 | 103 | |
New York | 2.31 | 5 | |
New York City FC | 0.65 | 25 | |
Orlando City | -0.38 | 73 | |
Philadelphia | 0.29 | 39 | |
Portland | -1.30 | 118 | |
Salt Lake | -0.81 | 101 | |
San Jose | -0.96 | 106 | |
Seattle | -0.75 | 97 | |
Toronto | -0.38 | 74 | |
Vancouver | -0.06 | 52 | |
Atlanta United | 0.24 | 43 | |
Minnesota United | -0.17 | 62 | |
Chicago | 0.82 | 21 | |
Colorado | 2.91 | 3 | |
Columbus | -0.19 | 63 | |
DC United | -0.05 | 51 | |
FC Dallas | -1.43 | 122 | |
Houston | -0.65 | 90 | |
Kansas City | 0.82 | 20 | |
L.A. Galaxy | -1.28 | 117 | |
Montreal | 0.52 | 32 | |
New England | -0.25 | 66 | |
New York | 3.79 | 1 | |
New York City FC | 1.43 | 11 | |
Orlando City | -0.04 | 50 | |
Philadelphia | 0.96 | 18 | |
Portland | -1.27 | 115 | |
Salt Lake | 0.58 | 28 | |
San Jose | -0.30 | 67 | |
Seattle | 1.35 | 14 | |
Toronto | 0.25 | 42 | |
Vancouver | -0.44 | 78 | |
Atlanta United | 0.86 | 19 | |
Minnesota United | -0.62 | 89 |