It’s early June, and the Pittsburgh Pirates aren’t mathematically eliminated from the playoffs yet! But seriously, the Pirates are above 0.500, which is a big deal because they’ve had a record 19 consecutive losing seasons.

A winning season would finally allow them to break their embarrassing losing streak. But there is something else interesting about the Pirates season this year. They’ve actually been outscored by 22 runs! In their 56 games, they've scored 179 runs and given up 201. You would expect a team to have a losing record when they’ve given up more runs than they’ve scored, not the other way around! In fact, of the 11 other teams in baseball that have been outscored by 10 runs or more, only two (Cleveland and Miami) have winning records.

So the question is, should Pirates fans be encouraged that their team has a winning record or concerned that they are being outscored? That is, which stat is a better predictor of future success? Hmmmmm, that sounds like a job for Minitab’s Regression Analysis.

For the last 3 years, I took each baseball team's winning percentage on June 8th. I also recorded their run differential (total runs scores in the season minus total runs allowed) on June 8th. Then I took their winning percentage in games played after June 8th until the end of the regular season. So total, there are 90 observations (30 teams times 3 years). You can get the data here and follow along if you like. We want to see which is correlated more with future winning percentage, a team’s current winning percentage or their run differential. We’ll start by using Minitab to create a Fitted Line Plot for each situation. Here is the plot that uses a team’s current winning percentage.

We see that a team’s current winning percentage explains only 24.6% of the variation in the team’s future winning percentage. That’s not very much. If you want to determine how a baseball team is going to perform the rest of the season, you shouldn’t be looking at just their current record. So will the run differential tell us more?

This is a little better, but not by much. Run differential explains 26.8% of the variation in the team’s future winning percentage. So it looks like we can’t simply look at the run differential either. What if we use a regression analysis and put both run differential and current winning percentage in the same model?

The first thing we'll look at in this regression analysis is the R-squared value.  We see that it is only 27.35%, no better than the fitted line plots with one predictor. And on top of that, the p-values for both of our predictors are above 0.05! This model isn’t going to work at all! We’ll remove Current Win % from the model because it has the higher p-value. This brings us back to the model with Run Differential as the only predictor.

Well, at least the p-value for Run Differential shows us that the predictor is significant. So what happens if we plug the Pirates current run differential into that equation?

Pirates Winning % the Rest of the Season = .500 + .000976*(-22) = 0.479

So if the Pirates win 47.9% of their remaining games, they’ll finish the season with a record of 79-84. That’s another losing season! But remember, our R-squared value was very low, so we can’t be very confident in this number. There are many other factors that will affect the number of games Pittsburgh (or any baseball team) wins between now and October. Players get traded, players get injured, and players return from injury. Our model doesn’t account for any of that. Plus there is just the random variation that goes along with sports. The better team doesn’t always win! The truth is, it’s just too early in the season to accurately predict if the Pirates can keep up their winning ways. So I’ll come back a little later into the season to see if I can do any better!

Photo by daveynin, used under Creative Commons 2.0 license.