Analyzing Baseball Park Factors: Don't Settle for Easy Answers
In an earlier post, I used AT&T Park to illustrate that a single number is rarely a good way to summarize data. Even the mean and median have their limitations.
The time series plot shows how the park factor dropped below 0.80 after the 2010 season, when it had been around 1 previously. I left you with a question about why the park effect appeared to change so drastically at AT&T Park.
Let’s take a look at a few fun theories.
"It’s El Nino."
Well, not really El Nino. The last El Nino event in California was 2009-2010. But you can find a relationship between weather in San Francisco and the park factor. Look at what happens when you use the mean of the average daily temperature and the mean sea level pressure in San Francisco between April 3rd and October 5th to predict the park factor at AT&T Park.
Park Factor = -24562.3 + 399.168 Mean temp + 819.055 Mean pressure - 13.3102 Mean temp*Mean pressure
The model that includes the interaction between temperature and pressure has an R2 value of 91.90% and a predicted R2 value of 73.20%. Those statistics represent a lot of the variation in the data.
Late in 2010, the San Francisco Giants opened a new seating area in right field. Initially sponsored by Coors Light and called the “Coors Light Cold Zone,” it was just the latest in many changes to seating arrangements that have taken place since AT&T Park opened (with a different name) in 2000. Adding, removing, or changing seats can have all kinds of effects on a ballpark. Maybe walls moved or changed in height, or maybe the winds coming in out of San Francisco bay were affected. Either way, those seats came in right before the park factor dropped dramatically.
"It’s not San Francisco."
One interesting possibility is that the change in the park factor doesn’t have to do with changes in San Francisco. If scoring stays about the same AT&T Park, but games involving the Giants as the road team have more scoring, then the AT&T Park would look more like a pitchers’ park.
Enter a familiar suspect: Chase Field. According to an article by correspondent Jeff Summers, preparations for the 2011 All Star Game at Chase Field included the addition of LED boards on each side of the large dbTV scoreboard in center field. Also, mosaics on the panels that open to provide ventilation to the dbTV were replaced by high-definition photos. Take a look at the time series plot of park factors at AT&T Park when we add Chase Field. A dramatic increase in the park factors at Chase Field happens at the same time that those at AT&T Park drop.
What do I think?
So what do I think after working through the data? I think there’s a cause, but it’s too complicated to be explained by anything simple. If the weather really had a strong, causal effect on park factors, then you might expect to see similar effects at other baseball parks. But if you use the same model at other ballparks, you get somewhat ridiculous predictions. For example, here's Minitab's fitted line plot of the predictions from the AT&T Park model to the real park factors at Coors Field:
Forget about that less-than-stellar R2 value, and notice that the real park factors range between 1.1 and 1.6 while the predictions range between 0.5 and 4. We get some pretty awful statistics if we try to use the same predictors but estimate new coefficients too. Expand the data set to include Chase Field, Petco Park, and Safeco Field, and the R2 drops to 36.18%.
While an association between weather and park factor might exist, the relationship is not as simple as the strong fit statistics for the San Francisco data suggest.
As for the seating theory, sure, there’s a physical change. But there’s no record that the renovations involved moving walls or anything else that would explain a relationship with park factor.
The corresponding change that shows up with the renovations in Chase Field is also most likely a coincidence. If the park factor for Chase Field really changes, then the park factor for AT&T Park could be affected—the Giants play more games against division rivals. But if the change to Chase Field affected the Giants, then I would expect it to affect the other division rivals too. Notice that I left those off my first graph. Here’s a time-series plot that shows all of the National League West teams:
When Chase Field changes between 2010 and 2011, the only corresponding effect is at AT&T Park. That's not the plot we would expect if the change at Chase Field was affecting the park factors of visiting teams that often play at Chase Field.
No Easy Answers
The amount of change at AT&T Park from 2010 to 2011 is unusual, at least from the perspective of how we would expect moving ranges to behave in a stable process.
But as we've seen, it’s important to be careful about accepting easy answers about the cause of the change.