Apr 07, 2016
Last Modified: Apr 12, 2016
One issue with looking at how Corsi changes across the season is that the amount of data makes it difficult to get a concise visualization that accurately depicts the whole season. Historically to chart this I've looked at plots of aggregate information - namely correlational R2 numbers - but doing so creates a level of abstraction that really hides a lot of the data. What I've done here is an attempt to look at both aggregate and individual data points in an elegant manner.
This chart is essentially a three dimensional scatter plot, with time being the third dimension. On the x axis we have a team's ESVA Corsi percentage in a given year. On the y axis we have that team's point percentage. On the z axis, or time in this case, we have the number of games played starting at 1 and ending at 80. My usual disclaimer being that I'm missing a few games so a full season in this case is 81 rather than 82. I've also added a trendline for each number of games, but as a note this line is for all 7 seasons even if you hide data. Point percentage is used rather than strict points to normalize across the season; in the beginning of the season, a team may have upwards to 160 points available to them whereas at the end of the season they'll only have 2.
Presenting the data in this way does somewhat intuitively explain a few things that might be missed looking at the data in other ways. At the beginning of the season points drift a lot horizontally - that is, across a different number of games played, teams are likely to have wildly different Corsi percents. This helps explain why Corsi needs about 15 games before it starts being predictive - too little games and a lot of noise doesn't cancel out. Similarly, at the end of the season there is a lot of vertical stratification, which shows that at that there isn't enough granularity in point percent to be predictive.
Unfortunately, at least for me, this asks more questions than it answers. What is a normal amount of drift for a team's Corsi numbers? Can we glean more information about Corsi by looking at teams that drift too much or too little? What information can we gain by looking at teams that are consistently distant from the trend?