3 Things Baseball Can Teach Us About Control Charts
Control charts are some of the most useful tools in statistical science. They track process statistics over time and detect when the mean or standard deviation change from what they have been. The signals that control charts send about special causes can help you zero in on the fastest ways to improve any process, whether you’re making tires, turbines, or trying to improve patient care.
I’ve mentioned before that I’m a baseball fan. For the past several years, I’ve been noticing articles about the Year of the Pitcher in Major League Baseball (2010, 2011, 2012, 2013, 2014). That repetition suggests a shift to me, and I thought “What a great way to illustrate some neat things that you can do with control charts!”
Here are a few things to remember about control charts that you can illustrate with Major League Baseball data from 1969 to 2013, courtesy of numbers from www.baseball-reference.com.
Use meaningful units in control charts
We'll start in 1969 because that’s when new rules decreased the height of the pitcher’s mound in Major League Baseball parks. However, there have been some other notable changes in the game over the years that mean that we have to be sensible about the data that we plot. For example, if we make an I-MR chart of hits, we see some special causes right away:
Four points are out of control on the MR chart because of strike-shortened seasons in 1981 and 1994. One technique when you know the reason for an out-of-control point is to exclude those samples from calculating the control limits. That way, the control limits represent expected process variation. But in statistics we like to use as much of our data as possible. If you want to keep the data from those years, an alternative to throwing them out would be to plot a different variable. I used Minitab’s Calculator to create a column that contains the number of hits per at bat.
Set the baseline
The control chart above shows the number of hits per at bat. You still see some out-of-control points on the chart, but they no longer correspond to the strike-shortened seasons. The first out-of-control point is 1972. Not coincidentally, the American League instituted the designated hitter in 1973. A corresponding increase in hits per at bat makes sense. The next out-of-control signal comes in 1994. The most popular explanation, given that 4 of the next 7 points are out of control, is that this marks the beginning of the steroid era in baseball. The steroid theory explains that, beginning in 1994, increased use of performance-enhancing drugs reached a tipping point so that the effects of drugs were statistically visible in the game. Another explanation is that 1993 is when baseball began playing games in the thin air of Colorado, where Mile High Stadium was a hitter-friendly precursor to Coors Field.
In cases like this, you have to decide whether it’s fair to compare all of this variation on one chart or not. If you know that there has been a change in the rules, then you would expect to see corresponding out-of-control points. In fact, we might not be getting enough out-of-control points to show the changes precisely.
The same logic applies to any process: typically, you want to calculate control limits from a stable baseline. For example, if you calculate the control limits using the years 1973-1993, then 3 of the 4 years without the designated hitter are out-of-control and 6 of the 15 years 1994 to 2007 are unusual. The out-of-control points show when the process was different from the baseline years 1973-1993:
If you calculate the control limits using the years 1994-2013, then the MR chart shows precipitous changes in 1973 and 1993:
The easiest way to create control charts
Of course, when you have different things to compare, you might want to look for points that are unusual relative to the process that they should fit. For example, you would want different control limits before and after you improve a process. Minitab’s Assistant Menu makes this easy with Before/After control charts. With the baseball statistics, a before/after control chart lets us look for points that are unusual within an era. Let’s set the dividing line at 1993 and just use the post-designated hitter years:
In the first era, there are no out-of-control points. In the second era, the years 2010-2013 are unusual, marking the return of pitching domination so many people notice. The Assistant Menu also performs a statistical test to verify that the average batting average is statistically greater in the second era than in the first.
Control charts are a powerful tool for understanding your processes. Minitab makes control charting easy, whether you want to compare different eras in baseball or different phases in your process. And the Assistant Menu makes comparisons even easier by providing all of the information you need in a single report, ready for you to export to a presentation.
Ready for more? Check out our webcast on using control charts!
Photograph of Donald "Zack" Greinke by Keith Allison, used under Creative Commons Attribution-Share Alike 2.0 Generic license.