But as a stat nerd, the question I have is how crazy has June really been? What are the odds of throwing a perfect game and a no-hitter? (Don't worry, it doesn't take a Six Sigma Black Belt to figure that out, either!) But before we start, we have an important question to answer.
There have been 22 perfect games, with the first two both happening in 1880. But in 1880 pitchers threw underhand, it took 8 balls to draw a walk, and a batter was not awarded first base if they were hit by a pitch. In other words, the odds of pitchers in 1880 throwing a perfect game were vastly different than today. To account for this, I'm going to start collecting data at 1900, as people seem to agree that this is when the modern era of Major League Baseball began. Since 1900, there have been 20 perfect games and 235 no-hitters.
I went to baseball-reference.com and recorded the total number of games played, including any postseason games, for the last 113 seasons (I included all games played through June 13th, 2012). I also recorded the league average for on-base percentage. If you want to follow along, you can get the data here.
Since 1900, there have been 181,921 major league baseball games. But in each game, there are two pitchers. So to get the number of opportunities for a perfect game, we need to double that number. That means since 1900, there have been 363,842 opportunities for a perfect game. And only 20 of them have occurred! What are the odds?
Odds of throwing a perfect game = 20 / 363,842 = 0.000055 = approx 1 in 18,192
Yeah, that's pretty low. Giants fans that were in attendance at AT&T Park Wednesday night should consider themselves extremely lucky. What about Mets and Mariners fans? How lucky should they consider themselves?
Odds of throwing a no-hitter = 235 / 363,842 = 0.000646 = approx 1 in 1,548
That's still quite lucky, but not near as much as the perfect game. So, sorry Mets and Mariners fans, we're going to focus on the perfect game from here on out. It's just more interesting. Why? Well...you'll see.
The odds above are just what we've observed in the last 113 years. But let's stop for a minute and think about what we would expect. To pitch a perfect game, no runner can reach base. That means you have to get 27 batters out in a row. So the probability of throwing a perfect game is equal to the probability of getting 27 batters out in a row.
Remember when I said for each season I collected the league average for on-base percentage (OBP)? Well, OBP is the percent of the time that a batter reaches base (either by a hit, a walk, or getting hit by a pitch). That means the probability of getting a batter out is 1 minus the on-base percentage. I'll have Minitab calculate the average OBP since 1900.
This is the average OBP for all of major league baseball in the last 113 years. So the probability of a pitcher getting a batter out is:
1 - 0.32856 = 0.67144 = 67.1%
This means that over the past 113 years, batters get out 67.1% of the time. Now, this number isn't constant, as it changes slightly depending on the batter and the pitcher. But I can't break down every plate appearance since 1900, so we're going to stick with this number. Now let's calculate the odds!
Odds of throwing a perfect game = 0.67144^27 = 0.00002134 = approx 1 in 46,800
Again, the true odds depend on the pitcher and the team he's pitching against. Some games will have odds slightly better, and some will have odds slightly worse. But they should even out, making our odds of 1 in 46,800 a good estimate for the average game. So using a probability of 0.00002134, how many perfect games would we expect to see in 363,842 opportunities?
Expected number of perfect games = 0.00002134 * 363,842= 7.8 Perfect Games
So there have been more than twice as many perfect games as we would expect! But of course, the 7.8 number is just the average. Certainly we could get other outcomes. After all, if you flip a coin 100 times, you're not always going to get 50 heads. We can use a probability distribution plot to visualize the other possibilities. We use a binomial distribution with 363,842 trials and an event probability of 0.00002134.
We see that any number of perfect games between 4 and 11 wouldn't be that uncommon. But wait, there have been 20 perfect games. I don't even see any gray bars even close to 20! In fact, by using Minitab's cumulative distribution function the probability that we would see at least 20 perfect games since 1900 is 1 in 5,780. That's very uncommon!
It could be. But think of is this way. Imagine we take the 181,921 games played since 1900, and say they are just one sample. Then we take another sample of 181,921 games. And then another. And another, until we have 5,780 samples (it would take over 650,000 years). In just one of those samples, we would expect to have at least 20 perfect games. So are we just "lucky" enough to have that sample be the very first one we took? I'm thinking not.
Then something has to be wrong with the expected value, right? I guess so, but I'm not sure what it is. And then I found some numbers that really made my head spin. Let's take the fact that there have been 20 perfect games and work backwards:
In a league where there have been 20 perfect games in 363,842 opportunities, we would expect the average OBP of the league to be 0.305. Why did this make my head spin? Consider these stats:
So in a league made up of nothing but clones of Nolan Ryan, CC Sabathia, and the last 10 pitchers to throw a perfect game (that includes Randy Johnson), you still wouldn't have a league where the average batter gets out 69.5% of the time. Mind = Blown.
Well, I can confidently say that they are low, at least 1 in 18,192 and no higher than 1 in 46,800. But for the life of me, I can't figure out why these two numbers are so different. If anybody has any theories, I'd love to hear them! In the meantime, I'll finish with some things that definitely have better odds of happening than a perfect game.
Photo by Art Siegel, used under Creative Commons 2.0 license.