Oct 11, 2015
The adjustments in this article are due to some discussion over this article. Some of the information here has already been verified by that article, so if you're just looking for new developments jump to the "Additional Adjustments" section.
The first minor change is that we see a slight correlational increase by adding a Down 3+ and Up 3+ category, rather than just having a Down 2+ and Up 2+ category. The sample sizes get increasingly small outside of 3 goals though, so we don't see much of an increase past that. Furthermore, as of the 2013 season, this site used team TOI numbers to weight its adjustments. It turns out that using the total number of Corsi events works out just as well, and the formula simplifies (at least mathematically) using those weights. Rather than having:
$$C_{SA} = {\sum \limits_{n=down2}^{up2} {TOI_{n}({{F_{n}} \over {F_{n} + A_{n}}} - {{F_{avg_n}} \over {F_{avg_n} + A_{avg_n}}})} \over \sum \limits_{n=down2}^{up2} TOI_n} + 50\% $$
we get
$$C_{SA} = {\sum \limits_{n=down3}^{up3} {({F_{n} + A_{n}})({{F_{n}} \over {F_{n} + A_{n}}} - {{F_{avg_n}} \over {F_{avg_n} + A_{avg_n}}})} \over \sum \limits_{n=down3}^{up3} F_{n} + A_{n}} + 50\% $$
With a little bit of factorization that formula boils down to:
$$C_{SA} = {\sum \limits_{n=down3}^{up3} {{F_{n}A_{avg_n} - A_{n}F_{avg_n}} \over {F_{avg_n} + A_{avg_n}}} \over \sum \limits_{n=down3}^{up3} F_{n} + A_{n}} + 50\% $$
This is mostly just for housekeeping, the main point being that we no longer need to track TOI in each state to get an accurate adjustment. We do, however, see a minor increase in predictiveness to future points using these weights:
The data used for this chart and the remainder of this article is an average of the past 6 full seasons' even strength numbers - so going back to the 2008-2009 season and excluding the 2012-2013 season (which was not a full season). You can find the data for individual seasons here. Because it's less data to keep track of without negatively impacting our effectiveness, this site is switching away from TOI weights and using event counts to weight its score adjustment terms.
Score adjustment, as it turns out, is one of a few things we can adjust for. The Hockey-Graphs article by Micah Blake McCurdy also adjusts for venue and time adjustments, however finds that time adjustments aren't very effective. Hence I've added venue adjustments to this site, but not time.
I also attempted to look at what I'm calling event adjustments. Corsi is calculated from four different event types - goals, shots, misses, and blocks - but it weights them all the same, meaning, for example, a block is considered as predictive as a miss. This should strike us as a major shortcoming to the statistic. On a basic level, each different event type gives us different information, much in the same way that a shot when the team is up 2 goals gives us different information as a shot when the team is down. These kind of adjustments may sound familiar if you've heard of Tango, which as this article by Nick Mercadante points out, isn't any more predictive than Score Adjustment. There are many issues with Tango, but apart from it's limited analysis the main issue is that it focuses too much on goals, which are not very relevant to corsi/fenwick as a whole. It also rather arbitraritly defines its weights - apart from a goal being "more predictive," it really doesn't explain why every other event is considered 1/5th of a goal. We can overcome both these shortcomings by weighting each event type by its own rarity, but in a slightly different manner than our other adjustments.
There is some variation when you look at event occurrences across score and venue adjustments. Splitting by score difference and venue, we see the following for event occurences as a percent of all events in that state:
On average when a team scores in any state, we see about 6 blocks, 12 shots, and 5 misses. These rarities give us a baseline as to what we can weight our events. Goals do indeed become very strongly weighted as Tango tries to do, but this weight is no longer arbitrary, is not goal centric, and more importantly we are not lumping the other three event types into the same weight. Weighting this way also creates a situation that seems very counter intuitive - being the most common event, shots on goal are weighted less than both blocks and misses, making the latter two categories more predictive. This is striking because taken on their own, misses and blocks are less correlated to success than shots are.
Now, all this weighting is rather pointless unless we can show that it does indeed increase predictivity. Correlating event, venue, and score adjusted weights to future points in the season yields the following R2 values:
Corsi ESVA here is adjusting for Events, Score, and Venue. Corsi SVA is Score and Venue; Corsi SA is just Score adjustment. We can see that Corsi ESVA does substantialy better than Corsi SVA or Corsi SA at almost all points thorughout the season. Data for individual seasons here.
I feel a word of caution is needed at this point. The increase in predictiveness that we see is due to event rarity compared to the whole data set, not an event's particular correlation to success. What this means is that any particular event in and of itself is still relatively useless, but it help paints the bigger picture when combined to the whole. In fact, we lose predicitive value if we formulate event adjustment the same way we formulate venue or score adjustment - formulas which treat the value being adjusted as predictive in its own right. And, in regards to misses and blocks being more predictive than shots on goal, I feel compelled to mention Goodhart's Law which states, "When a measure becomes a target, it ceases to be a good measure." This adjustment is by no means saying blocks or misses are more predictive of success on their own, merely that they are more rare. If players or teams were to intentionally miss to increase their standings in this metric, it would self adjust because those events would become more common - not to mention they'd never score a goal, meaning they'd never record an event in the most heavily weighted category.
The formula for ESVA is monstrous. The data is split 7 ways for score, 2 ways for venue, and 4 ways for events, totalling 56 separate buckets any individual event can fall in. It's so grotesque that I don't know how to fit it in one equation, but here's it in three:
$$CF_{nv} = (S_{avg_{nv}}+B_{avg_{nv}}+M_{avg_{nv}}+G_{avg_{nv}})({SF_{nv} \over S_{avg_{nv}}} + {BF_{nv} \over B_{avg_{nv}}} + {MF_{nv} \over M_{avg_{nv}}} + {GF_{nv} \over G_{avg_{nv}}}) $$
$$CA_{nv} = (S_{avg_{nv}}+B_{avg_{nv}}+M_{avg_{nv}}+G_{avg_{nv}})({SA_{nv} \over S_{avg_{nv}}} + {BA_{nv} \over B_{avg_{nv}}} + {MA_{nv} \over M_{avg_{nv}}} + {GA_{nv} \over G_{avg_{nv}}}) $$
$$C_{ESVA} = {\sum \limits_{v=home}^{away} \sum \limits_{n=down3}^{up3} {{CF_{nv}CA_{avg_{nv}} - CA_{nv}CF_{avg_{nv}}} \over {CF_{avg_{nv}} + CA_{avg_{nv}}}} \over \sum \limits_{v=home}^{away} \sum \limits_{n=down3}^{up3} CF_{nv} + CA_{nv}} + 50\% $$
For a bit of nomenclature, C is Corsi, S is Shots, B is Blocks, M is Misses, G is Goals, F is For, A is Against, and any avg term is the league average. Any terms that don't have a F or A are F + A.
The combinations here are also so vast that my server can no longer calculate them on the fly; as such I'm using precalculated weights for all the averaged event weight terms (SFavgnv, BFavgnv, and the like), using all data from the 2008 to 2014 seasons. This will allow me to keep partial on the fly calculations, allowing users the same amount of control they've had historically. It turns out we get more predictive results that way anyway, so this really isn't an issue.
The Split Half Reliability for Corsi ESVA is it's shortcoming. When we compare Corsi SA, Corsi SVA, and Corsi ESVA from 2008-2014 we get .67, .66, and .58, respectively, meaning that Corsi ESVA varies more than our other metrics. I'm unsure as to why this occurs. Randomized trials show a similar dip, which would imply these changes aren't due to intraseasonal effects like the trade deadline. The variability is likely simply due to the increased number of data splits (this follows that Corsi SVA is also less reliable than Corsi SA), but I don't know how to confirm this.
This site isn't equipped for direct discussion, so if you would like to discuss this article you can contact me via the information found on the about page.