Great Expectations: dCorsi

One of the best tools to come out of the hockey analytics community recently is dCorsi. dCorsi is the difference between how a player performed and how a player was expected to perform.

dCorsi= Actual-Expected

The expected value is determined by a series of variables regressed by @SteveBurtch , including position, TOI/60, quality of teammate and competition,  zone starts, and team.  You can read more in Burtch’s write-up on the concept here .

Knowing what to expect from a player is very important when judging how a player performed. If a player drove possession, but played the easiest minutes on the team, it’s possible that the player is not achieving what he should.  On the other hand, if a player doesn’t stand out as a possession driver, but he plays the hardest minutes on the team, he could be outperforming expectations.

dCorsi can be a bit of a challenge to understand, since it can go against what people generally believe about players.  Shea Weber, Ryan Suter, and Anton Stralman are instances where the data conflicts with the popular understanding of the players.

I put together the visualization below to try to show how dCorsi can be used in player evaluation.

A few notes:

  • The black bar indicates the expectation for the player
  • The colored bars indicate how the player actually performed
  • In Corsi For, if the player’s bar is above the black bar, the player exceeded expectations
  • In Corsi Against, if the player’s bar is below the black bar, the player exceeded expectations.
  • Green is good, red is bad
  • The player bars are sized by the deviaton from expectation, good or bad. Big bar= big impact. Small bar= small impact
  • You can use the third tab to add your own players to the list, and compare anyone you wish.

The data used in this post is from War-On-Ice

Boxes and Whiskers Part 1: Team Data

So much of what is written about in hockey uses “average”, a familiar term to most people. The most common type of average is the mean (add all your data up and divide it by the number of data points). The median is another type of average that can also be useful. The median is the middle value in an ordered set of data.

For example:

3.9, 4.1, 4.2, 4.3, 4.3, 4.4, 4.4, 4.4, 4.4 , 4.5, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1

If you count in from the right and the left, you’ll find that the bolded “4.4” is the middle value.

The median can be useful because it is not hostage to outliers in the data.

The mean for that set of data is 4.5. If we add a 10 to the end of the data set, things change.

3.9, 4.1, 4.2, 4.3, 4.3, 4.4, 4.4, 4.4, 4.4, [4.45] , 4.5, 4.5, 4.6, 4.7, 4.8, 4.9, 5.0, 5.1, 10

The median is now 4.45, and the mean is 5.08. The median barely moved, while the mean was affected by the outlier.

The visualization below will be the first in a series that uses box and whisker plots to display score-adjusted data from War-On-Ice . Box and whisker plots display how data is distributed around the median. Khan Academy has a good explainer on these if you aren’t familiar.

Be sure to change what data is displayed by using the “Measure Selector” filter, and which team is displayed by using the “Select Team” filter (both at the bottom of the page).

The two things to watch for on these graphs are the height of the box plots, and the orange line marking the median.  If the boxes are short, that means the data is tightly distributed around the median.  Conversely, if the boxes are tall, the data varies widely from the median.

(Note: some of the data is bounded, meaning it has an upper or lower limit that cannot be  exceeded.  For example, Goals For per 60 cannot have a negative value. Bounded data will skew how these charts appear, though the median will still be informative.)