We’ve been pretty excited about March Madness here at Minitab. Kevin Rudy’s been busy creating his regression model and predicting the winners for the 2015 NCAA Men’s Basketball Tournament. But we’re not the only ones. Lots of folks are doing their best analysis to help you plan out your bracket now that the tip-offs for the round of 64 are just a day away. As you ponder your last-minute changes, I’ll compare some models to see where they agree and disagree. Here are some highlights from Five Thirty Eight, Microsoft Bing (in their debut to bracketology), PlayoffStatus.com, Ed Feng's Power Rank, and our very own Kevin Rudy’s model based on the Sagarin rankings.
Who’s going to win it all?
There’s no bigger question than who’s going to win it all, and for some analysts, there’s never been more certainty. Here’s how the 1 and 2 seeds from each region measure up. I included Kansas by virtue of its seeding, but for the 5 mathematical models here, Kansas has a lower mean probability of winning the tournament than 3 seeds Iowa State and Notre Dame. Being in the same region as Kentucky will do that to you.
One interesting point is that while Kentucky is the favorite for every model, there’s a big difference in how sure people are about it. The model from Microsoft Bing has Kentucky at 18% and second-place Duke at 14%. For PlayoffStatus.com, the prediction is 17% for Kentucky and 12% for Villanova. None of the other models have anyone with 20% of Kentucky. Not every model gives game-by-game predictions, but I suspect that some of the difference comes from the projected ease-of-victory in the final game of the tournament. Five Thirty Eight, who announced that college basketball parity is over, use their model to say that if Kentucky gets to the championship game they should win 76.8% of the time against most likely opponents Villanova, Virginia, and Gonzaga. Microsoft Bing gives Kentucky a 55% chance of defeating most likely opponent Duke.
Upsets the models agree on
While not necessarily shocking upsets, the average win probability among the 5 models is higher for the lower seed in two games in the first round. The biggest difference in seeding is number 11 Texas defeating number 6 Baylor.
While 11 seeds historically win about 36% of the time in the first round, 3 of the models have Texas as a favorite heading into the game and all of them say that Texas has a better chance to win than a randomly selected number 11 seed.
The second upset is number 10 Ohio State defeating number 7 Virginia Commonwealth University.
This game is the Kevin Rudy special, a bold prediction of nearly 71% to win even though 10 seeds historically win only about 28% of the time. While Microsoft Bing and PlayoffStatus.com don’t have Ohio State as the favorite as the other 3 models do, everyone agrees that Ohio State is much more likely to win than a randomly selected 10 seed.
Bing gets bold
As you’re making your last-minute tweaks, it can be heartening to know that the experts don’t always agree. Here’s a graph of the standard deviations among the first round probabilities for the 5 models. The most disagreement is around different games than you might expect:
The disagreement for Oklahoma State vs. Oregon exists because the team at Microsoft Bing has made the boldest prediction of the tournament.
In a game that’s supposed to be hard to call between an 8 and 9 seed, where in the history of the tournament 8 seeds have won 47% of the time, Microsoft Bing predicts a 91% win probability for Oregon. This prediction is particularly surprising because Microsoft Bing isn’t 91% sure about very many games. Kentucky, Villanova, Notre Dame, Arizona, and Iowa State are the only other teams that get over a 90% probability to win their first game. Even Microsoft Bing’s second-most-likely participant in the title game, Duke, only gets an 87% chance to finish off North Florida (Microsoft Bing’s projected winner of tonight’s game against Robert Morris.).
The disagreement about the game between Eastern Washington and Georgetown is also because of the prediction from Microsoft Bing.
In a game where 4 seeds have historically won nearly 81% of the time, Microsoft Bing projects Georgetown as an underdog to advance. That’s encouraging news for Tyler Harvey and the rest of the Eagles. But since this is Microsoft Bing’s debut in predictions and traditionally good models strongly disagree, it’s tempting to speculate that the extreme Bing results are flaws instead of insights. We’ll know by Thursday.
Simulations, advanced statistics, machine learning, and plenty of linear algebra go into coming up with models that describe how likely a team is to win a basketball game. It’s convenient when the models agree, but that won’t always be the case as long as some amount of variability comes from factors we can’t measure. In such cases, comparing different models is a sensible practice. Here’s to your bracket success, may it do at least as well as Pete Thamel’s, the first person I found with a published bracket without Kentucky in the national championship game.