Wednesday, June 29, 2011

Why Shooting Stats Are Better Than Goals

Let's say you are asked to rank the NHL teams halfway through the season. Which stats should you use to do this?

Before getting to that, we need to think carefully about what a ranking means. The best team in the NHL is the one that is the best at winning games, and so on down the line. This comes down to two things - scoring goals and preventing the opposition from doing the same. If someone says X "is the best team in the league", what they mean is that X is the best at outscoring their opponents. Similarly, if Y is the best player in the league that means that he is the best at the combination of generating goals for his team and preventing them for the other.

Success at scoring and preventing goals in hockey, like every activity, is a combination of skill and luck. For some things, e.g. roulette, luck is the dominant factor. In others, like sprinting 100m, skill overwhelmingly wins the day. Hockey falls somewhere in the middle, perhaps closer to roulette than anyone would care to admit. Getting back to ranking the teams, that means figuring out which are the strongest at the skill part. Note that I'm using skill loosely here to refer to any skills that help a team score goals and prevent them, including those like grit and mental toughness that pundits love to talk about.

There are a few ways to tease out this skill component, all of which I will use in various articles in the future. Here I will compare stats from each team in two different groups of games - each half of the season, numbered even vs. odd, etc. The idea behind this is that luck in the first half of the season and luck in the second half of the season should be completely unrelated. Sometimes your team will get lucky in the first half and unlucky in the second half of the season, but the opposite is just as likely. Think of it like two coin tosses. If you win the first coin toss then you are no more likely to win the second than if you'd lost it. In contrast with the luck factor, your team should usually be about as skilled in the second half of the season as the first. If there is no relationship, known as correlation, between luck in the first half of the season and the second any link will be due to skill.

Mostly due to the availability of data, I restrict attention to 5-on-5 situations where both goalies are on the ice. For each of the past four years, I split the season in half and look at how goals and shooting stats in the first half relate to goals in the second half. Because we care about both scoring and allowing goals, I expressed this as a percentage: goals for divided by the sum of goals for and against (GF/(GF+GA)). The same goes for shooting stats.

Here is a graph of the relationship between goal percentage in the first half of the season and the second. All data are from timeonice. See links on the right.


It looks rather weak. The numbers back that up - the correlation is just 0.13. This is not statistically significant. Even ignoring that, it's pretty clear that putting up good scoring numbers 5-on-5 with the goalies in net in the first half of the season doesn't mean much in the way of predicting performance in the second half.

The relationship between Corsi percentage in the first half of the season and goal percentage in the second half is far stronger. Corsi percentage is like goal percentage, but for all types of shots, including missed shots and blocked shots. Here is the scatterplot:


You can see a distinguishable up-and-right pattern, which indicates a stronger relationship between the two. The correlation is 0.36, which is statistically significant. Keep in mind that we're looking at how shooting ratios in the first half relate to goals in the second half.

Let's look at the best and worst teams in the first half of this last season. The New Jersey Devils were an impressively bad 10-29-2 on January 8th, with an overall goal differential of -58 (72 - 130). 5-on-5 with goalies in their goal differential was -48 (45 - 93) and goal percentage 32.6%. That is the worst goal percentage in either half for any team in any of the four seasons of data that is available at timeonice. In contrast, the Flyers looked like world beaters halfway through. Their record was 26-10-5, goal differential +30 (137-107) and goal% 5-on-5 a cool 60%. What happened in the second half? The Devils put up one of the best turnarounds in NHL history, nearly making the playoffs, and the Flyers record was mediocre. The Devils went 28-10-3, the Flyers 21-13-7. The Devils had an overall goal differential of +23 (102-79), the Flyers +6 (122-116). 5-on-5 with goalies, New Jersey had a goal differential of +23 (76-53), 58.9%, and Philly 0 (81-81), 50%.

How could the worst team in the league in the first half have a better second half than the best team by such a large margin? The answer comes down to the luck factor I discussed above. In the first half, New Jersey took 52.6% of all the 5-on-5 Corsi shots in their games. Philadelphia was actually worse, just better than even at 50.6%. Despite that, the Devils got hugely outscored and the Flyers got far more goals than their opponents. While skill may be a factor in shots going in and being saved by your own goalie, the topic of my next article, luck plays a massive role in scoring over just a half season. The Devils were clearly not getting the bounces and the Flyers were. In the second half of the season, Philadelphia's luck was about average and New Jersey actually caught the breaks.

You can see how much better Corsi stats handle luck by looking at the two teams in the graphs above. New Jersey is the red point and Philadelphia orange. You can see that the Devils are a huge outlier when you look at goals in the first and second, but not so looking at Corsi in the first half and goals in the second, though you can see that they were fortunate. The goals graph is so scattered that the Flyers don't stand out much, but you can see that they dropped off a lot by how far they are from the top of graph. On the Corsi graph they are right in the middle, so from that perspective their second-half performance should have been expected instead of surprising.

Other articles might stop there, but things get more interesting if you run a regression. Regression analysis is a tool I will use pretty frequently. It allows you to separate out different effects. In our case, we want to know how important goals in the first half are once you take Corsi into account, and vice-versa. The regression makes it very clear that Corsi% is a far, far better predictor of goal% in the second half than first-half goal%. Not only that, it appears that virtually all of the tiny amount of explanatory power you get from goal% comes from the fact that goals are a type of shot.

When the regression spits out a formula, the size of the coefficient tells you how big its effect is. When both first-half goal% and Corsi% are included, the goal% coefficient is a minuscule 0.007. For the stats nerds, the standard error is 0.087 so the p-value is an astonishing 0.936. This is about as statistically insignificant as it gets. For comparison, the coefficient for Corsi% is 0.550 (SE of 0.142, p < 0.001) which is very strongly significant. If you have a team that breaks even on goals in the first half of the season but Corsi outshoots its opponents 60-40 then they will average about 83.3 goals scored and 66.7 allowed in the second half of the season (assuming 150 total 5-on-5 goals, which is close to the league average). If instead you have a team that was even on shots but won the goal battle by that much then they will average 75.2 goals in the second half and concede 74.8.

Once Corsi is taken into account, goals do not at all predict future success.


Topics left for future articles:
- What about score effects?
- What about Fenwick?
- What about special teams?
- Is shooting all luck, then?

2 comments:

  1. Patrick D (SnarkSD)November 12, 2011 at 2:40 AM

    Nice article. I had often thought about running a regression on the individual components of CORSI to see which type of shot carries the most weight. Have you seen any data on this?

    ReplyDelete
  2. Thanks.

    I'm not sure about type of shot, but Gabe Desjardins has written a bunch of stuff on shot quality in general. It's difficult to look at shot quality because of scorekeeper differences and also the differences tend to be small with luck winning the day.

    ReplyDelete