Evaluating pure NHL hockey talent comes in many forms and is a forever evolving field of work for teams and analysts alike. Here at TWC, we’ve worked on making a model for comparing the 5v5 offensive contributions of all NHL players.
Heading into the new season, the TWC Player Offensive Evaluation Tool (POET) got a bit of a rework and this is its next evolution. Our first model was put together last year entirely on Google Sheets, which was familiar but lacked the necessary computing speed to make the model truly useful. This year, we redid the whole thing using R.
The basis of the model works by using publicly available stats courtesy of Natural Stat Trick to compare offensive contributions from NHL level players. The model focuses on three years of game play, so for this version, 5v5 data from the 2017-18 season through to the 2019-20 season is included.
Let’s see the POET output for a carefully* selected Calgary Flame.
*Most elite at making friends
You can see that Matthew Tkachuk is more than one standard deviation above average for all 5v5 offensive contributions according to POET. So how does the model work in computing these outputs? Here’s the full breakdown of the model, from data acquisition all the way to final POET chart outputs.
Two sets of data are retrieved per year. For each season, players with fewer than 100 minutes of 5v5 ice-time are filtered out. Players are labeled as forwards or defencemen, and are only compared to their positional peers.
First, 5v5 score-and-venue adjusted on-ice percentages are obtained for each of the following stats:
- Low-danger Corsi for (LDCF%)
- Medium-danger Corsi for (MDCF%)
- High-danger Corsi for (HDCF%)
- Expected goals for (XGF%)
- Offensive zone starts (OZS%)
Individual Per 60 Rates
Similarly, individual per 60 rates are also grabbed, but these stats are not adjusted for score and venue. Some of these stats directly complement the on-ice percentages.
- Individual low-danger Corsi for (iLDCF)
- Individual medium-danger Corsi for (iMDCF)
- Individual high-danger Corsi for (iHDCF)
- Goals, primary assists, and secondary assists (G, A1, A2)
- Penalties taken and penalties drawn (PENT, PEND)
Lastly, one more stat that is neither a percentage nor per 60 stat is obtained:
- Time on ice per game played (TOIGP)
One initial calculation is included, which converts penalties taken and drawn per 60 into a player’s penalty differential per sixty (PEN). Overall that makes 14 metrics that will make up the factors weighted within the model.
For all players that skated within the past three seasons, they have to be sorted into one of seven groups based on the years they played at least 100 minutes:
- Playing all three years (one group)
- Playing in two of the three years (three groups)
- Playing in one of the three years (three groups)
This gives us a basis to weight offensive contributions by year, giving more value to the most recent seasons, and less as we reach further back in time.
For simplicity, the years are linearly weighted such that a player skating in all three years would have his most recent year worth triple the weight, the second most recent year worth double the weight, and the oldest year worth its own weight.
Stats from players who played in two seasons are similarly weighted to give double the weight to the most recent year, and players who only appeared in season have all the weight in that year. This is an important piece of context to keep in mind when looking at the outputs of players who have just started their NHL careers.
As an example, here are the formulas for xGF% for Matthew Tkachuk, who played all three seasons; versus Andrew Mangiapane, who did not reach the threshold in 2017-18; versus Juuso Valimaki, who only reached the threshold in 2018-19.
|Tkachuk||(1 / 6) * xGF_18 + |
(2 / 6) * xGF_19 +
(3 / 6) * xGF_20
|Mangiapane||(1 / 3) * xGF_19 + |
(2 / 3) * xGF_20
The same method is used for the rest of the metrics. At this point, all stats used in the model are weighted and calculated based on each player’s seasonal appearances.
Next, the weighting of each contributing metric needs to be determined. This portion included a lot of trial and error and testing different players to see if the values made sense based in part from their stats and eye testing too. A bit of subjectivity is present, but we tried to be as methodical with the weightings as possible, and assign weights that accurately conveyed that player’s offensive performance.
Goals per 60 are the main metric that all other metrics are compared against in this model. To make the math a little easier, the weighting for goals per sixty is set at 100. The rest of the coefficients are set as follows:
|G||100.00||The main metric to compare other coefficients to|
|A1||85.00||Primary assists are viewed as nearly as valuable as a goal|
|A2||15.00||Secondary assists are viewed as much less valuable|
|LDCF%||5.95||The calculated shooting percentage of all low-danger chances (past three seasons)|
|MDCF%||15.65||The calculated shooting percentage of all medium-danger chances (past three seasons)|
|HDCF%||22.78||The calculated shooting percentage of all high-danger chances (past three seasons)|
|XGF%||50.00||Giving an expected goal half the weight of an actual goal|
|PDO||15.00||Some luck is included, and weighted the same as an A2, but ranked in reverse|
|iLDCF||5.95||Same as LDCF%|
|iMDCF||15.65||Same as MDCF%|
|iHDCF||22.78||Same as HDCF%|
|TOI||20.00||Rewarding players with higher ice time|
|PEN||15.00||Rewarding positive penalty impacts|
|OZS||10.00||Rewarding players with harder deployment, but ranked in reverse|
For every metric in the model, normal distribution probabilities can be used to place where a skater compares to either the entire population of forwards or defencemen. To calculate probabilities from normal distributions, the means and standard deviations of every metric are computed.
Here is a hopefully easy to understand example for the forward group using the mean and standard deviation of the goals per 60 metric:
|Goals per 60 statistic||Value|
|Matthew Tkachuk’s three-year weighted goals per 60||0.8233|
So the question to ask is then:
What percentage of forwards have a goals per 60 value of less than 0.8233 if the mean is 0.6011 and the standard deviation is 0.2875?
Computing the value in this example gives Tkachuk a score of 0.7802, meaning his goals per 60 is in the 78th percentile for forwards.
All metrics are computed this way to give each player a set of scores. However, for PDO and OZS, a slight difference in the value is used to reward players with lower PDO values and lower OZS ratios.
Using Tkachuk again, an example looking at his OZS:
|Matthew Tkachuk’s three-year weighted OZS%||52.20|
The question is reframed to:
What percentage of forwards have an OZS value of greater than 52.20 if the mean is 53.95 and the standard deviation is 10.81?
The answer gives Tkachuk a score of 0.5643, meaning he has less offensive starts compared to 56.43% of all forwards. By reframing the question, he is rewarded instead of penalized for having tougher deployments.
TWCScore and offensive contributions
Now that everything is computed, we can combine everything into an aggregate metric, which we coined TWCScore for skaters. This takes the weightings and the computed normal distribution scores for every player and adds it all up, giving us a novel metric to compare the offensive contributions of all forwards against each other and all defencemen against each other.
We also combined subsets of the metrics to create three additional components: Possession, individual shot generation, and scoring.
TWCScore includes all 14 statistics, possession includes the three on-ice Corsi for percentages at the three danger levels, individual shot generation includes the individual Corsi for rates at the three danger levels, and scoring simply sums goals, primary assists, and secondary assists.
So this gives us the entire basis of the POET and how its metrics are computed. For a sanity check, we can look at the list of top players by TWCScore to see if the names that show up make sense in terms of players that are known to have dominant 5v5 offence.
Top TWCScore Forwards
Top TWCScore Defencemen
The top listed players at either position fit the bill as the leagues’ best 5v5 skaters, so at the very least, this version of POET passes the sanity check.
The full POET output for 2021 can be downloaded here:
From here, we can create plots to quickly visualize and compare players. To do so, a perfectly average player is created to compare against, where all of their hypothetical stats fall right into the 50th percentile.
This helps us calculate the Z-score for individual players for TWCScore, possession, individual shot generation and scoring, which shows how many standard deviations a player is above or below average.
This finally gives us our final product: 5v5 POET charts.
Here are the TWCScores for the Flames’ current roster (as of January 2021), followed by their POET charts. If you’d like to see charts for other players on different teams, please comment below or reach out to us on Twitter @wincolumnblog, we’d be happy to make more charts available.