Thursday, March 12, 2009

(Abuse of) Statistics 101

At Bracketology 101, a commenter has marshaled some supposedly relevant historical statistics to argue that Creighton should get an at-large bid. He notes that in the past 10 years, there have been 133 teams in the RPI top 50 who had 25 wins, and all went to the Tournament. Because Creighton has an RPI of 39 (now 40) and 26 wins, the commenter concludes that they should Dance.

At first glance, perhaps this sounds compelling. After all, 133 teams! 10 years! No one with this profile has ever been denied!
In fact, this is a great opportunity to explore the abuse of statistics in sports. Let’s look at why.

(1) To use historical comparisons to assess Creighton's chances, you have to compare Creighton to teams that were in a similar situation. It’s completely meaningless to compare this year’s Creighton to teams that were clear locks and nowhere close to the bubble. But that’s exactly what the commenter does by including all 133 teams with top 50 RPIs and 25 wins. This includes, for example, all of the top 10 RPI teams last year (e.g., Kansas, Memphis, UNC, UCLA). Does the fact that these teams made the Tournament tell you anything about Creighton’s chances this year? Of course not.

To make a meaningful comparison, let’s limit it to the 20 teams ranked from 31 – 50 in the RPI (roughly 10 spots above and below where Creighton’s RPI is right now). Because there are 34 at-large bids total, this tends to be the “bubble range” where the rubber hits the road. If you focus on that range, there are only 18 teams who have won 25 games. The incredibly compelling "133 teams in 10 years" figure starts to lose its allure.

(2) Still, you might think that 18 out of 18 is a pretty solid indicator. If these 18 teams all made it, doesn’t that bode well for Creighton’s chances? Well the problem is that not all 18 of these teams received at-large bids. In fact, only 3 of them did! The other 15 received automatic bids by winning their conference tournaments, so we have no idea whether the Selection Committee would’ve given them an at-large bid or not. So now instead of 133 teams in 10 years, we are down to 3 similarly situated teams in 10 years. Sure, all have made the Tournament, but at this point the sample size is so small that the comparison is not all that meaningful. Still, let’s take a look at these three teams, and see what they tell us about this year’s Creighton team. Here they are (RPI in parentheses):

  • 2006 George Washington (37) - This team had a clearly superior resume to Creighton's. They went 26-1 during the regular season before being upset in the A-10 tournament, had a 3-0 record against top 50 teams and no bad losses. Creighton went 25-6 during this regular season, was 2-2 against top 50 teams, and lost twice to teams ranked above 150 in the RPI. This is not a useful comparison.
  • 2002 Southern Illinois (50) - This is actually a very good comparison. Southern Illinois, another Valley team, finished with the exact record Creighton has now (26-7), had 3 top 50 wins (including one over Indiana that I remembered listening to on the internet), and a handful of bad losses (more than Creighton).
  • 2001 St. Joe’s (35) - This is also a sound comparison. St. Joe’s finished 25-6, had 3 top 50 wins, and a couple of bad losses.

These last two comparisons are actually instructive and bode well for Creighton's chances.

On the other hand, what happens if we adjust the parameters of the commenter’s comparisons ever so slightly? What if we look at top 50 RPI teams that had 24 wins instead of 25? Well that gives us the case of the 2006 Hofstra team that did not make the tournament, despite a higher RPI (30) and a resume that is very similar to Creighton’s now. They were 24-6, 3-2 against the top 50, and had one loss outside the top 100. They also played in a tougher conference, which sent two teams to the tournament, one of whom reached the Final Four. And yet, they were denied.

Consider also the case of the 2006 Missouri State squad, another Valley team, that was ranked 21 in the RPI (!). They played five fewer games than Creighton did this year, so they only won 20, but they had 4 top 50 wins, and not a single loss outside the top 50. And that year’s Valley conference was much tougher, producing four Tourney teams and two Sweet 16 runs. Yet Missouri State wasn't invited to the Dance.

The bottom line here is that historical information can be helpful, but in the case of Creighton, the evidence is limited and mixed. Unfortunately, the commenter’s statistical abuse is just a microcosm of the way in which sports commentators assign predictive value to so-called historical trends without any regard for the relevant variables or sample size.

1 comment:

  1. Thanks for following up on that comment. It's always difficult to compare one year to the next because they are all different but when you take a big sample it can be useful. And despite what some may say (Palm in particular) the bubble is a little bit stronger this year than in the past few years.

    ReplyDelete