*Me with my Dad and Uncle before game 7 of the 1992 NLCS. I think I fell asleep 15 minutes after this picture was taken.*

On October 14^{th}, 1992 the Pittsburgh Pirates and Atlanta Braves were set to play game 7 of the National League Championship Series. The winner went to the World Series, the loser went home.

I was a mere 8 years old, and as much of a Pirates fan as an 8 year old can be. I also had a friend who was a Braves fan, and in the school bus line we argued over who was going to win the entire series. I had yet to hear of statistics or Minitab, so my arguments consisted of little more than “Braves Suck! Pirates are going to win!” For an 8 year old, I think my data analysis was spot on.

The Pirates held a 2-0 lead going into the bottom of the 9^{th}, and I had long since fallen asleep on the couch. The Braves cut the lead to 2-1 and had the bases loaded with 2 outs. Then Francisco Cabrera hit a single to left field, scoring two runs as Barry Bonds just missed throwing out Sid Bream at the plate. Perhaps if he had started taking steroids earlier he would have made the out, sending the Pirates to the World Series. Anyway, the good news was that I slept through the entire disastrous inning. The bad news was that I had to wake up sometime. And when I did wake the next morning, I cried and demanded that I wasn’t going to school. My mom made me go anyway. And the Pirates haven’t had a winning season since.

Fast forward 20 years. We’re more than halfway through the MLB season, and the Pirates are 11 games over 0.500. Earlier in the year, I used a regression analysis in an attempt to predict whether the Pirates would finally finish the season with a winning record. I found that it was just too early in the season to be confident in any such predictions. But now that there have been almost twice as many games played, I’m going to see if I can do better.

For the last 3 years, I took each baseball team's winning percentage on July 20^{th} and their winning percentage at the end of the season. You can get the data here and follow along if you like.

So here is my first question. **How often does a team that is above 0.500 on July 20 ^{th} finish with a losing record?** We can use Minitab’s Tally command to find out.

We see that the last 3 years, 51 teams have had a winning percentage over 0.500 on July 20^{th}. The 39 observations with missing values are the teams that were below or equal to 0.500. So of those 51 teams, only 10 were not able to finish the season with a winning record. Of course, the 2011 Pirates were one of those teams. I blame Jerry Meals. But the 2012 Pirates have a winning percentage of 0.560. **Have any teams with a winning percentage that high finished below 0.500?**

No they haven’t. Of the 10 teams that blew their winning record after July 20^{th} the highest current winning percentage was 0.531. The Pirates are well above that number, so things look good. However, do you know who the team was that went 0.531 until July 20th then went on to have a losing record? Yep, the 2011 Pittsburgh Pirates. Have I mentioned that I blame Jerry Meals? The Pirates went 19-42 after that blown call last year and finished the year with a winning percentage of 0.444. So how much can we really take from a team’s current winning percentage? That is, **what is the correlation between a team’s current winning percentage and their final winning percentage?** We can use a regression analysis to find out.

The regression analysis tells us that 76.1% of the variation in a team’s final winning percentage can be explained by their winning percentage on July 20^{th}. That’s even more good news for the Pirates because their current winning percentage is pretty high. We can use this model to predict their winning percentage at the end of the season.

The “Fit” value tells us that the model predicts the Pirates to end the season with a winning percentage of 0.549. But remember, our model accounts for only 76.1% of the variation in the final winning percentage. So we have to look at the intervals to determine how confident we can be in that prediction.

But wait; there are two intervals, a confidence interval and a prediction interval. Which one do we use? The 95% confidence interval provides a range of values likely to include the *mean* response for a given predictor value. That is, if we took multiple teams with a current winning percentage of 0.560, we would expect the *average* final winning percentage of all the teams to be between 0.540 and 0.558. But we’re not interested in the average final winning percentage of multiple teams. We’re interested in a *single* observation, the 2012 Pittsburgh Pirates.

For that we have to examine the prediction interval. The prediction interval tells us we can be 95% confident that the 2012 Pittsburgh Pirates will finish the year with a winning percentage between 0.481 and 0.617. Ah, we’re so close to having that entire interval be above 0.500! But we can lower the confidence level until we find an interval that is above 0.500. It turns out that a confidence level of 84% does the trick.

So we can say that we’re 84% confident that the Pirates will finish the season above 0.500. Previously we also saw that 80.4% of teams with a winning record on July 20^{th} end the season with a winning record. I like the Bucco's chances.

So is it a lock that Pittsburgh is going to finish above 0.500? No, it isn’t. Both the 2011 Pirates and the 2011 Boston Red Sox have shown that crazy things can happen late in the season. But those teams are outliers, and we’ve shown the 2012 Pirates *definitely *had the odds in their favor.

And if things keep up, the topic is going to shift from the Pirates finishing with a winning record to the Pirates making the playoffs. If the season were to end today, Pittsburgh would be in a Wild Card Showdown game against Atlanta. Winner moves on to the Division series, loser goes home. Personally, I say bring on the Braves and let Jerry Meals be the umpire. There are some demons that need exorcised, and this could be the year to get rid of them all!