I recently created a decent set of MLS possession data while working on another project, and I was curious if the patterns of the famous Reep analysis would hold for MLS. Thus, I attempted to replicate his result, and perhaps offer a couple new perspectives to the data.
I was first introduced to the legacy of Charles Reep while reading The Numbers Game (by Chris Anderson & David Sally). Reep was an early advocate for applying statistics to soccer, and was famous for tracking game events by hand over many seasons. According to his data, most goals were scored from possessions with three passes or fewer. And this was taken as empirical justification to play directly; minimizing the touches with longer passes in order to improve results.
Although Reep’s status as a pioneer in the sport is secure, many still debate the results and interpretation. Some critiques assert the underlying data was misinterpreted. Highlighting a simple majority of goals may not be the best analysis when most possessions had three or fewer passes anyway. Others suggest the structure of the analysis confuses correlation with causation; leading to misapplication of the results. In short, one can’t tell if the results were caused by the number of passes, or whether some other factors have causal roles. As I attempt to recreate the analysis; it’s worth stating the same criticisms and critiques apply to this replication effort as well.
Read More