Analyzing Baseball Park Factors: Home of the San Francisco Giants
Because I didn't trust the numbers on the ESPN web site, I calculated my own park factors using their formula. There’re a lot of interesting ways to look at the numbers, but one of the first things I want to do is to focus in on a classical statistics lesson:
One statistic rarely tells the whole story.
Let’s focus on AT&T Park, home of the San Franciso Giants. ParkFactors.com notes that “Overall, AT&T Park plays as a neutral park, with summer days favoring hitters but the damp nighttime air being particularly helpful to pitchers. As an open-air park by the Bay, the park is also quite subject to variable winds.” The variable winds and the day-night difference might help explain why AT&T Park has an unusual property over the last 7 years. On average, it’s a pitcher’s park with a mean park factor of about 0.97. Typically, it’s a hitter's park with a median park factor of 1.03.
Does that sound a little odd to you? I hope so. Unless we use Minitab to get a more thorough picture of the data, it almost seems like nonsense.
We can see what's going by looking at the data using individual value plots from Minitab. Here’s the plot of the AT&T park factors from 2006-2012 with the mean shown by the square. A mean below 1.00 indicates a park that favors pitchers.
Now, here’s the same graph with the median shown by a square. Because the median is larger than 1.00, we would say that AT&T Park typically favors hitters.
So what’s happening? In 4 of the 7 seasons, the park factor favors hitters, so the median favors hitters. But In 2 of the 3 seasons that favor pitchers, the factors are much further from neutral than any of the years that favor hitters. These low park factors have a great influence on the mean, which thus indicates that the park favors pitchers.
In fact, it's hard to say that either the mean or median, both of which are close to neutral, really give a good picture of what happens at AT&T Park. The Minitab graphs, which show how non-neutral the park can be, are essential to understanding it.
We have more data than just the park factors though. We also have time. While the main description on ParkFactor.com classifies AT&T Park as neutral, the classification based on the most recent 3-year average indicates that AT&T Park is an extreme pitcher's park. A time-series plot of the data by year puts what happened in 2012 and 2011 in stark relief.
When you use statistical software such as Minitab to explore the data more deeply, you get a lot more information than if you rely on any single measure. Graphing the data over time prompts us to ask: What’s happening in 2011 and 2012? Did something change about the team? Did something change about the park?
I’m going to explore those questions more deeply next time.