Ranking NBA Refs, pt. 1 - Modeling Bad Calls

Jackson Curtis
Jun 22, 2021
3 min read

Updated: Jul 9, 2021

Since the 2021 All-Star I've been running a Twitter bot that posts before each Jazz game with the names of the game's referees and how those referees have done in the previous Last Two Minute reports put out by the NBA (where the NBA assesses correct and incorrect calls during the final two minutes of close games). These L2M reports provide some of the only transparency the league gives into the performance of its referees. Currently, my bot ranks the referees by taking a simple average of the number of bad calls made for each minute they were on the court. This is less than ideal for two reasons:

Differences in the amount of time refs were assessed. If one ref was assessed just once and happened to do well, they will be ranked highly despite little evidence that they are a good ref. This is not a hypothetical concern, the actual minutes assessed varies from 4 (eg. JT Orr) to 56 (Eric Lewis).
The average does not adjust for who you are reffing with. The simple average makes no attempt to distinguish who's to blame for bad calls, so if a good ref is paired with a bad ref, and the bad ref makes frequent bad calls, the good ref will be ranked artificially poorly.

We can do better by building a predictive model to actually model each ref's bad call rate. To do this, I built a Bayesian Poisson regression model with the goal of partitioning blame for bad calls to specific referees. Next week's post will do an in-depth walkthrough on the assumptions and code for that model, but for now the results are shown above. By extrapolating the bad call rate in Last 2 Minute Reports, we can normalize the rate to a 48 minute period (a typical game). The results suggest that the best refs in the league (Tre Maddox, Mousa Dagher, Jacyn Goble) are making about 3 bad calls a game while the worst refs in the league (Curtis Blair/Ken Mauer) are making 17-20. There are two big caveats with that conclusion: (1) it assumes Last Two Minute reports are a random sample of the reffing performance. While close endings to games are definitely not random samples, it's not obvious to me whether we would expect the bad call rate to be higher or lower in less stressful situations. (2) The confidence intervals on these are massive. A future blog post will demonstrate just how much uncertainty is in these estimates, which really demonstrates just how little transparency the NBA provides about their reffing.

The graph below compares our old rankings to our new rankings. Reassuringly, there is high correlation between the two ranks. We can use this graph to assess whether our model is making the adjustments we wanted. The two small (size corresponds to minutes assessed) green points demonstrate the adjustment being made for those with small sample sizes. JT Orr (bottom left) was ranked 1st overall in the simple average rankings, but because he was only assessed for four minutes, our Bayesian model used its prior to penalize his ranking so that in the new ranking he's only 24th overall. Bill Spooner's rank improved for the same reason.

The three green points in the middle of the graph demonstrate the other problem the model was built to address. Although ranked close to each other when calculating the simple average, they are on polar opposite ends of the rankings using the poisson model. Pat Fraher, the ref at the top who was sorely penalized under the new model, was reffing with refs who averaged only 7.4 bad calls per game. Right below him, Nate Green was reffing with refs who overaged 8.1 bad calls per game. Finally one of the people who benefited most from the new model, Leon Wood, was reffing with refs averaging 10.2 bad calls per game. Because he spent the most time with poor refs, he received less blame for the bad calls made on his watch, and the model ranked him highly.

As always you can check out my code on my Github. Check back next week for a detailed walkthrough of the code there.

Ranking NBA Refs, pt. 1 - Modeling Bad Calls

Recent Posts

留言