# Chaos at the Kentucky Derby? Bet on It!

If betting wasn't allowed on horse racing, the Kentucky Derby would likely be a little-known event of interest only to a small group of horse racing enthusiasts. But like the Tour de France, the World Cup, and the Masters Tournament, even those with little or no knowledge of the sport in general seem drawn to the excitement over its premier event—the mint juleps, the hats...and of course, the betting.

As most of you probably already know, then, a big part of betting is the odds placed on a particular horse, so that a bet on the favorite to win the race would pay out significantly less than a big underdog. It stands to reason, then, that those horses with the best chances of winning would tend to win more often and those with the worst chances would win less frequently.

Odds are typically listed as something like 10/1, which indicates a \$1 bet would win \$10. Now, this is the opposite of what we typically think of as odds, since a 10/1 horse is actually estimated as having a 1/10 chance (or "odds") of winning. So I'm going to call 10 the "inverse odds" of a horse. Therefore a low inverse odds horse would be considered a favorite to win (for example, a 2/1 horse has a 50% chance of winning based on betting odds and an inverse odds of 2), while a high inverse odds horse would be considered unlikely to win (for example a 50/1 horse has an estimated 2% chance of winning).

So a simple graph showing the inverse odds of every horse from 2007-2013 against its finishing position should easily confirm that horses with low inverse odds perform better, and horses given high inverse odds perform worse, right?  Let's see what a scatterplot made with Minitab Statistical Software reveals.

You might convince yourself there is a little bit of a cluster there at the bottom left, but simple regression demonstrates that the relationship, while statistically significant, is extremely weak (R-sq(pred) = 4.69%). I expected to see a much stronger relationship. Instead, these data look pretty chaotic.

Sometimes when data appear chaotic, all it takes is some deeper digging into variables of interest to clarify things. So I used Minitab's Ordinal Logistic Regression tool and analyzed the same dataset to predict the odds of winning based on these factors:

• Inverse Odds
• Post (the starting position of the horse in the gate, with 1 being on the inside of the turns)
• Track (either "Fast" or "Sloppy" conditions)

I was able to get a better model that contained the following factors after eliminating those that were not significant:

• Post
• Track
• Inverse Odds
• Track*Post
• Inverse Odds2

For any given track conditions, pole position, and inverse odds, I now have the estimated chances that a particular horse will win. First let's take a look at the "Winning Odds" versus the "Inverse Odds":

We see that taking other factors into account, horses with lower inverse odds do in fact have higher odds of winning the race. The quadratic fit appears decent as well, although, as is more obvious on the blue line, horses with very low inverse odds tend to do a little better than the fit expects and those with really high inverse odds tends to do a little worse.  It is also obvious that sloppy track conditions (shown in red) tended to yield much more chaotic results. To explain that, we need to look at the odds of winning versus post position, which is not accounted for in the graph above:

What we learn here is that when track conditions are fast, a horse's odds of winning are pretty much the same regardless of which post position they start in.  But when conditions get sloppy (typically due to race-day rain), there is a very large advantage in having a low post position, toward the inside of the track.

However, there's one thing I mentioned before that has big implications for where one might place a bet...the inverse odds correspond to the payout for picking correctly. So to really learn something, we need to multiple the estimated odds of winning from our model by the payout if we were to win (per dollar bet).  A payout of less than \$1 indicates that over the long run we would expect to lose money placing that bet. Similarly, values over \$1 indicate we would expect to win money in the long run on that bet.

To demonstrate the expected payout, I'll use a 3D Scatterplot in order to display all relevant variables in a single graph:

To really explore a 3D graph you need to interact with it and rotate in multiple directions. This is easy to do in Minitab, but difficult to convey on the blog, so I'll share with you the takeaways:

1. In fast track conditions, almost all bets are long-term losers (the "house" takes a cut of every winning bet), but horses with long odds (50/1 or higher) would be expected to gain money.
2. In sloppy conditions, horses in the first few post positions are long-term winners almost regardless of inverse odds, with horses across the range of inverse odds expected to earn roughly the same.

Armed with this information, let me save you some time and provide you with some links you might find interesting:

Given what currently looks like sunny weather, I think I'll skip watching the post draw and take my chances with a long-shot like Harrys Holiday, Pablo Del Monte, or Vinceremos!

Kentucky Derby image courtesy: kentuckytourism.com