Introducing DAVIES: A framework for Identifying Talent Across the Globe

by Mike Imburgio and Sam Goldberg

Introduction

In the world of sports, the search for an all-encompassing player evaluation metric is never-ending. Baseball was the first to develop its metric with Wins Above Replacement. Basketball followed suit with Player Efficiency Rating, and Hockey WAR has come into the fold within the past year. The US Soccer market has been searching for the first iteration of its widely applicable player value metric until very recently, when “Goals Added” was created by our brilliant colleagues at ASA.  

Goals Added measures a “player’s total on-ball contribution in attack and defense” by “calculating how much each touch changes their team’s chances of scoring and conceding across two possessions.” Through these calculations, Goals Added returns two numbers: how many total goals a player adds to their team over the course of the season, and how many goals above their average position counterpart a player adds to their team over the course of the season. Due to the nature of publicly available data, Goals Added is currently only available for MLS players. This is where DAVIES, the Determining Added Value of Individual Effectiveness including Style model steps in.  DAVIES utilizes publicly available data, accessible to anyone around the world, to estimate a player's total Goals Added for any given season while accounting for their age and style of play. 

This process ensures that DAVIES can be applied regardless of a team’s data contracts or funding. Furthermore, if a team wants to utilize this method but is already contracted with a data provider (Opta, StatsBomb, InStat, etc.), the metrics used to build DAVIES are similar enough across platforms to maintain accuracy. (Yes, we tested this too.) 

DAVIES accounts for both a player’s style of play and age group when predicting their Goals Added. This is vitally important, as teams now do not have to compare two players to one another strictly because they are marked in the same position group. This adjustment helps teams identify younger players for the future, in addition to measuring players accurately for impact in the present. DAVIES is also capable of estimating value independent of age and/or play style, allowing teams the flexibility to use the rating to best fit their needs.

A Deep Dive into DAVIES

DAVIES, named after MLS phenom and current Bayern Munich outside back Alphonso Davies, was inspired by its namesake’s rise to fame. We wanted to give teams around the globe the same chance as Bayern Munich to identify and sign the next iteration of Alphonso Davies, a player who could be anywhere in the world. So how did we do that?

The first step was to build a dataset of players with metrics that are made available by any data provider: xG, xA, Touches, Tackles Won, Passes Attempted, Successful Dribbles, and Clearances are a few examples of the types of metric we included. The next step we took was to build a few different complex models and concatenate them into a singular model. Known as ensemble modeling, this gave the modeling team a better idea of what metrics were actually important to use. 

After this process was completed, we identified important metrics using data from a different source than our final model used. This allowed us to estimate our model’s accuracy using different data sources and take advantage of the amount of MLS data we had available. Just as importantly, it helped avoid statistical biases that would harm the model’s real-world performance. After this process was completed, we had our final list of 17 important variables that we wanted to include in our final model. These metrics are:

Expected Goals, Expected Assists, Total Touches, Total Touches in the Attacking Penalty Area, Total Carries, Progressive Distance of Carries, Key Passes, Shots, Shots on Target, Passing Progressive Distance, Tackles Won, Clearances, Passes Attempted, Number of Times Fouled, Successful Dribbles, PKs Attempted, and PKs Won.

After this list was completed, we built and tested our final model to predict total Goals Added. While we could have stopped there, we felt it was important to take DAVIES a step further and account for a player’s style of play and age. This helps appraise the value of a player against their true counterparts while also allowing for future projectability. This ensures teams are not accidentally measuring Messi against a 19-year-old youth product with 10 games under his belt, but rather measuring Messi against attackers with his play style in his age group. (Spoiler alert: he is head and shoulders above everyone else.) 

So how does DAVIES measure style of play? First, we had to define distinct play-styles. Using a cluster analysis generated only from publicly available data, DAVIES broke down each player into one of four major play-style clusters:

  • Attacker

  • Central Progressor

  • Wide Progressor

  • Central Defender

Then, DAVIES re-clustered within each major play-style to identify more specific clusters. This process is the same as one used by a human player evaluator (generic to specific), but operates strictly through a mathematical process. Measuring players from generic to specific, and following the same process as a coach would, not only allows for a coach or front office to understand the mathematical process a little bit more clearly, but also to make changes where they see fit.

Once we defined the play-styles that DAVIES would use, we built a predictive model to generate the probability that each player falls into each play-style. Then, each player was sorted into the play-style that they fit into with the highest probability. 

To hone in on play-style rather than play quality, we focused less on how good a player was at a specific action, and more on the type of actions they completed. For example, the play-style aspect of DAVIES looked at how often a player shoots when they have the ball rather than total goals scored. The final styles within each generic group and description are:

Central Defender Group

Possession Oriented Defender - A central defender who’s on-ball actions often progress the ball and often defends higher up the pitch in the middle third.

Examples: Jan Vertonghen, Virgil Van-Dijk, Sergio Ramos, Kalidou Koulibaly

Low-Block Defender - A central defender who does a lot of the dirty work in their own third, while not spending as much time distributing or carrying the ball.

Examples: Davinson Sánchez, José Giménez, Sokratis, Nathan Ake

Wide Progressor Group

Offensive Wide Progressor - Wide players that contribute to both build up and deep defending, but are likely to get farther into the attacking third and play balls into the box.

Examples: Trent Alexander-Arnold, Juan Cuadrado, Marcelo, Alfonso Davies

Defensive Wide Progressor - Wide players that contribute to build up and play crosses into the box occasionally, but do more deep defending work in their own third than direct attacking work in the opponent’s third.

Examples: Benjamin Pavard, César Azpilicueta, Matteo Darmian

Central Progressor Group

Offensive Central Progressor - Players who play box-to-box, often carry the ball forward, play progressive passes and sometimes shoot or play balls into the box themselves.

Examples: Arturo Vidal, Sergej Milinkovic-Savic, Paul Pogba, Toni Kroos

Defensive Central Progressor - Players who play box-to-box, but defend more frequently than they attack and/or tend to stay deeper than offensive central progressors when their team is on the ball.

Examples: Sergio Busquets, Fabinho, N’Golo Kante, Miralem Pjanic

Attacker Group

Finisher - Players who operate mainly in the attacking third and frequently get into their opponent’s box to look for shots.

Examples: Robert Lewandowski, Ronaldo, Mo Salah, Kylian Mbappé, Harry Kane

Dribbler - attacking players who get on the ball during build up, then usually look to take on defenders to create chances in the final third.

Examples: Memphis Depay, Jadon Sancho, Sadio Mané, Eden Hazard

Playmaker - attackers who can drop into the middle third to aid build up, then usually look to play passes into teammates to create chances in the final third.

Examples: Messi, Neymar, Josip Illicic, Kevin De Bruyne

The DAVIES play-style definition process is data-driven, but by no means exact or deterministic - if a coach or recruitment team envisions a player in a slightly different role than the results placed him, they should be able to change it or view the player as compared to other play-styles. Moving to a new team that plays differently might also dictate that a player adapt to a new style - a central defender moving from a team that usually defends deep in their own box to a team that has 60% possession will likely have to play in the middle third more often and progress the ball more frequently.

To help this process, DAVIES allows for users to see how similar each player is to a given play-style in the form of a percentage. These percentages can help guide decisions as to which style to put a player into. For simplicity, DAVIES will by default place a player into the play-style with the highest percentage. However, for a player like Jadon Sancho who has some similarity to both the Dribbler and Playmaker styles, teams might want to examine a DAVIES score for both.

DAVIES also provides a more structured approach to play-styles than the traditional “eye-test” based approach. If you give an example player to a cohort of front office staff or coaches and ask them to describe a certain player’s style of play, you are likely to get a number of different responses with varying degrees of similarity. As such, their grouping of the player into a single play-style will also likely be different. At the very least, even for a coach or front office that do not believe in the use of data, it gives a second opinion on the player’s style of play that can be considered and debated. Most importantly, it allows DAVIES to provide comparisons based on players that do similar things with the ball and that operate in similar areas of the pitch.

Age Group

Defining age groups for players is an important step in creating the overall DAVIES model. To start the process, we created a curve for DAVIES scores by age. From that, the players were placed into 5 unique groups that reflect their current age stage for their career: Youth, Rising to Prime, Prime, Falling from Prime, and Veteran. To make this more tangible, the curve and the resultant groups are visualized below.

image1.png

This process also allows for malleability in measurement. If a team wants to sign a youth player, they can compare them to other youth players to evaluate how they are playing now, and measure them against prime age players to approximate their possible immediate impact. This can help with planning for when a player would break into the first team.  

Results

When we started building DAVIES, we didn’t know how many applications it could actually have. While the main use of DAVIES is identifying undervalued players, it can also be used to measure team strength and to learn more about trends that occur year to year within leagues. So now for the fun part: the results.

As somewhat of a proof of concept, we will present three lists: the ten best players in our dataset, the top ten players from this previous season, and the top ten youth players (Age 20 or Below) from the 2017/2018 season.

The first list, the top ten DAVIES players of all time:

image4.png

From this list, we can take away some themes that are fairly obvious, but worth mentioning. Messi is in a league of his own. For a dataset that only contains three seasons worth of DAVIES scores, Messi holds three out of the top four spots on the list. His reign at the top three was only spoiled by Kylian Mbappe breaking onto the world stage with PSG in his first season. Salah’s brilliant past two seasons, which saw him included in the Champions League squad of the season, win the champions league with Liverpool, and more, earned him two spots on the top ten list. Two notable names who earned themselves big money transfers the following season appear on this list: Timo Werner and Nicolas Pepe. A notable player left off is Cristiano Ronaldo. His highest finish in DAVIES was 29th overall, with all three of his seasons within the top 51 of all players present in our dataset. While still a top 25 player in the world, his relatively low involvement in the box and low creativity as compared to the top 5% of players in his cluster see him left out of the top 10. While the players above are ranked based on the entirety of the dataset, DAVIES can also analyze who had the best year in 2019/2020 as compared to similar players.

image2.png

Timo Werner killed it by all standards in 2019/2020 and as mentioned previously, earned himself a big time move to Chelsea. While there were no real surprises in the Top 5, expanding to the Top 10 brought somewhat of a late surprise in Edin Dzeko. At 33 years old, he is continuing to play at a top level, earning himself some interest from Juventus for a possible transfer. Finally, Kylian Mbappe had the 5th best DAVIES year out of anyone in Europe’s Top 5 Leagues, despite fewer possible games because of Ligue-1 ending early due to COVID-19. Who knows the level he could have reached with ten more games under his belt. 

Finally, let’s identify the top players under 20 years old from the 2017-2018 season to see how the next two years of their career evolved.

image3.png

It’s safe to say that DAVIES is fairly accurate at measuring youth talent as well as prime aged talent. Kai Havertz was just signed as Chelsea’s record breaking transfer. Christian Pulisic is becoming a key member of one of the Premier League’s biggest clubs. Houssem Auoar just had a breakout year at Lyon. Dayot Upamecano just made his French National Team debut and has been linked with the likes of Real Madrid and Manchester United. Ideally, large clubs using DAVIES as a tangible player recruitment tool would identify similar players earlier on and sign them for less than record-breaking fees.

Applying DAVIES To Your Team and Workflow

In order to build DAVIES, we scraped and utilized the StatsBomb powered data from FBRef, a public data source. As discussed previously, DAVIES only uses data that can also be found in other data streams, a decision that makes DAVIES applicable worldwide. As such, if your team contracts with StatsBomb or Opta you will be able to recreate this exact workflow with reasonable similarity and cleanliness. 

If your team pays for a private data source, you are also in luck, as this process should still be somewhat seamless. If you are a fan of the game and want to conduct your own DAVIES analysis, you’ll have to settle for Europe’s Top 5 leagues for now. When FBRef starts including advanced StatsBomb powered metrics for leagues outside MLS and Europe’s Top 5 Leagues, you’ll be able to utilize DAVIES for those leagues as well.

Shortcomings of DAVIES

DAVIES, despite its usefulness and applicability, does have some shortcomings that are worth noting. The first is that it can only be applied using advanced data. Unfortunately, FBRef only provides advanced metrics for Europe’s Top 5 Leagues and MLS. We already have a Goals Added metric for MLS; as such, we were left with only Europe’s Top 5 Leagues for analysis and model building. More leagues not only equates to more data and more accurate models, but also better results, as more players can be found in not as famous or well-known leagues. As an example, Tottenham found Dele Alli in League One from MK Dons before he broke onto the world stage. With more data from non-major leagues, DAVIES would be able to find something similar.

A true Goals Added measure is a far more complex calculation than the underlying DAVIES model (a predicted Goals Added value). Goals Added is more likely than DAVIES to capture a given action’s value in a more fine-grained sense by taking into account more detailed aspects of play, such as possession velocity and exact pitch coordinates. For this reason, Goals Added might be better at valuing players whose contributions are less easily measured by widely available statistics. However, the simplicity of the underlying DAVIES model might also be seen as an advantage - it’s easy for users to know exactly how the rating is calculated. Furthermore, for the majority of players, the relatively simple DAVIES model approximates the true Goals Added value very well - the average raw DAVIES score was within a single Goal Added from the true number.

In a similar vein, the clustering used by DAVIES to define play-styles used only publicly available data in order to make the model as accessible as possible. However, more specific event data input would have likely allowed for the definition of other styles that might improve specific player comparisons. 

A simple addition that might improve comparisons would be player position. While the positions listed on FBRef were too general for this purpose, clubs that have access to a data stream with more specific positions (i.e. Winger/Central Forward) and could incorporate a combination of position and style to help make more specific and accurate player comparisons. 

Concluding Thoughts

DAVIES had taken on many forms throughout the design and implementation process (over 700 emails traded), but ultimately we felt it was most important to make Goals Added accessible to the world, without the need for complex event data. Most clubs can’t afford event data or expensive data streams, often picking one or two foreign leagues to look at. With DAVIES, they can now get a leg up in the leagues and countries they find to be valuable. DAVIES, however, is not meant to be the end-all be-all, but rather one iteration in what will hopefully be a never ending development of player value metrics. While the metrics will become more advanced and the data streams will change, we hope the ideals and methodology behind DAVIES will remain: measuring players against others of the same age bracket and style of play in order to determine the best overall value for a player.

The consequences of DAVIES are also greater than just using data to evaluate players. DAVIES can be used as a qualification tool to identify shortlists and which players a front office needs to watch video on. This process then cuts down unnecessary time spent watching video on players who DAVIES already predicts to not be at the level of the squad. Saving this time and energy is incredibly important in a modern day workflow, as preparation for coming games often takes precedence over recruitment. 

We look forward to seeing how teams incorporate DAVIES into their workflow and what players they can find using the methodology outlined above..