If you like baseball pitching statistics, then you've *loved *the month of June. On the first of the month, Johan Santana pitched the first no-hitter in Mets history. Then a week later, the Seattle Mariners used 6 different pitchers to do the same thing. That tied the MLB record for most pitchers used in a no-hitter. And finally, 5 days after that, Matt Cain pitched the 22nd perfect game in major league history. And we're only halfway through June! It doesn't take a Six Sigma Black Belt to realize it's been a crazy month.

But as a stat nerd, the question I have is *how crazy *has June really been? What are the odds of throwing a perfect game and a no-hitter? (Don't worry, it doesn't take a Six Sigma Black Belt to figure that out, either!) But before we start, we have an important question to answer.

# What Year's Data Should We Start With?

There have been 22 perfect games, with the first two both happening in 1880. But in 1880 pitchers threw underhand, it took 8 balls to draw a walk, and a batter was not awarded first base if they were hit by a pitch. In other words, the odds of pitchers in 1880 throwing a perfect game were vastly different than today. To account for this, I'm going to start collecting data at 1900, as people seem to agree that this is when the modern era of Major League Baseball began. Since 1900, there have been 20 perfect games and 235 no-hitters.

# How Do We Calculate the Odds of a Perfect Game?

I went to baseball-reference.com and recorded the total number of games played, including any postseason games, for the last 113 seasons (I included all games played through June 13th, 2012). I also recorded the league average for on-base percentage. If you want to follow along, you can get the data here.

Since 1900, there have been 181,921 major league baseball games. But in each game, there are two pitchers. So to get the number of *opportunities *for a perfect game, we need to double that number. That means since 1900, there have been 363,842 opportunities for a perfect game. And only 20 of them have occurred! What are the odds?

**Odds of throwing a perfect game **= 20 / 363,842 = 0.000055 = approx **1 in 18,192**

Yeah, that's pretty low. Giants fans that were in attendance at AT&T Park Wednesday night should consider themselves extremely lucky. What about Mets and Mariners fans? How lucky should they consider themselves?

**Odds of throwing a no-hitter** = 235 / 363,842 = 0.000646 = approx **1 in 1,548**

That's still quite lucky, but not near as much as the perfect game. So, sorry Mets and Mariners fans, we're going to focus on the perfect game from here on out. It's just more interesting. Why? Well...you'll see.

# Is This What We Would Expect to Happen?

The odds above are just what we've *observed* in the last 113 years. But let's stop for a minute and think about what we would *expect*. To pitch a perfect game, no runner can reach base. That means you have to get 27 batters out in a row. So the probability of throwing a perfect game is equal to the probability of getting 27 batters out in a row.

Remember when I said for each season I collected the league average for on-base percentage (OBP)? Well, OBP is the percent of the time that a batter reaches base (either by a hit, a walk, or getting hit by a pitch). That means the probability of getting a batter out is 1 minus the on-base percentage. I'll have Minitab calculate the average OBP since 1900.

This is the average OBP for all of major league baseball in the last 113 years. So the probability of a pitcher getting a batter out is:

1 - 0.32856 = 0.67144 = **67.1%**

This means that over the past 113 years, batters get out 67.1% of the time. Now, this number isn't constant, as it changes slightly depending on the batter and the pitcher. But I can't break down every plate appearance since 1900, so we're going to stick with this number. Now let's calculate the odds!

**Odds of throwing a perfect game **= 0.67144^27 = 0.00002134 = approx **1 in 46,800**

Again, the true odds depend on the pitcher and the team he's pitching against. Some games will have odds slightly better, and some will have odds slightly worse. But they should even out, making our odds of 1 in 46,800 a good estimate for the average game. So using a probability of 0.00002134, how many perfect games would we expect to see in 363,842 opportunities?

**Expected number of perfect games **= 0.00002134 * 363,842= **7.8 Perfect Games**

So there have been more than twice as many perfect games as we would expect! But of course, the 7.8 number is just the average. Certainly we could get other outcomes. After all, if you flip a coin 100 times, you're not always going to get 50 heads. We can use a probability distribution plot to visualize the other possibilities. We use a binomial distribution with 363,842 trials and an event probability of 0.00002134.

We see that any number of perfect games between 4 and 11 wouldn't be that uncommon. But wait, there have been *20 perfect games*. I don't even see any gray bars even close to 20! In fact, by using Minitab's cumulative distribution function the probability that we would see at least 20 perfect games since 1900 is 1 in 5,780. That's very uncommon!

# Is It Just Random Variation?

It could be. But think of is this way. Imagine we take the 181,921 games played since 1900, and say they are just one sample. Then we take another sample of 181,921 games. And then another. And another, until we have 5,780 samples (it would take over 650,000 years). In just *one* of those samples, we would expect to have at least 20 perfect games. So are we just "lucky" enough to have that sample be the very first one we took? I'm thinking not.

Then something has to be wrong with the expected value, right? I guess so, but I'm not sure what it is. And then I found some numbers that really made my head spin. Let's take the fact that there have been 20 perfect games and work backwards:

- 20 perfect games / 363,842 opportunities = A probability of
**0.0000055**of getting 27 batters out in a row - 0.000055^(1/27) = A probability of
**69.5%**of getting*one*batter out - 1 - .695 = An average OBP of
**0.305**

In a league where there have been 20 perfect games in 363,842 opportunities, we would expect the average OBP of the league to be 0.305. Why did this make my head spin? Consider these stats:

- Batters that have faced Hall of Famer Nolan Ryan had an OBP of 0.307
- Batters that have faced Yankee Ace CC Sabathia have an average OBP of 0.306
- Batters that have faced the last 10 pitchers to throw a perfect game have an average OBP of 0.310

So in a league made up of nothing but clones of Nolan Ryan, CC Sabathia, and the last 10 pitchers to throw a perfect game (that includes Randy Johnson), you *still *wouldn't have a league where the average batter gets out 69.5% of the time. Mind = Blown.

# So, What *Are *the Odds of a Perfect Game?

Well, I can confidently say that they are low, at least 1 in 18,192 and no higher than 1 in 46,800. But for the life of me, I can't figure out why these two numbers are so different. If anybody has any theories, I'd love to hear them! In the meantime, I'll finish with some things that definitely have better odds of happening than a perfect game.

- Winning $400 on a Pirates or Phillies Pennsylvania Lottery scratch off ticket (1 in 12,000)
- Having a randomly picked clover be a four-leaf clover (1 in 10,000)
- Getting four of a kind in a 5 card poker hand (1 in 4,164)
- Successfully navigating an asteroid field (1 in 3,720……at least according to C-3P0)

*Photo by Art Siegel, used under Creative Commons 2.0 license.*