The Ghost Pattern: A Haunting Cautionary Tale about Moving Averages
Halloween's right around the corner, so here's a scary thought for the statistically minded: That pattern in your time series plot? Maybe it's just a ghost. It might not really be there at all.
That's right. The trend that seems so evident might be a phantom. Or, if you don't believe in that sort of thing, chalk it up to the brain's desire to impose order on what we see, even when it doesn't exit.
I'm going to demonstrate this with Minitab Statistical Software (get the free 30-day trial version and play along, if you don't already use it). And if things get scary, just keep telling yourself "It's only a simulation. It's only a simulation."
But remember the ghost pattern when we're done. It's a great reminder of how important it is to make sure that you've interpreted your data properly, and looked at all the factors that might influence your analysis—including the quirks inherent in the statistical methods you used.
Plotting Random Data from a 20-Sided Die
We're going to need some random data, which we can get Minitab to generate for us. In many role-playing games, players use a 20-sided die to determine the outcome of battles with horrible monsters, so in keeping with the Halloween theme we'll simulate 500 consecutive rolls with a 20-sided die. Choose Calc > Random Data > Integer... and have Minitab generate 500 rows of random integers between 1 and 20.
Now go to Graph > Time Series Plot... and select the column of random integers. Minitab creates a graph that will look something like this:
It looks like there could be a pattern, one that looks a little bit like a sine wave...but it's hard to see, since there's a lot of variation in consecutive points. In this situation, many analysts will use a technique called the Moving Average to filter the data. The idea is to smooth out the natural variation in the data by looking at the average of several consecutive data points, thus enabling a pattern to reveal itself. It's the statistical equivalent of applying a noise filter to eliminate hiss on an audio recording.
A moving average can be calculated based on the average of as few as 2 data points, but this depends on the size and nature of your data set. We're going to calculate the moving average of every 5 numbers. Choose Stat > Time Series > Moving Average... Enter the column of integers as the Variable, and enter 5 as the MA length. Then click "Storage" and have Minitab store the calculated averages in a new data column.
Now create a new time series plot using the moving averages:
You can see how some of the "noise" from point-to-point variation has been reduced, and it does look like there could, just possibly, be a pattern there.
Can Moving Averages Predict the Future?
Of course, a primary reason for doing a time series analysis is to forecast the next item (or several) in the series. Let's see if we might predict the next moving average of the die by knowing the current moving average.
Select Stat > Time Series > Lag. In the dialog box, choose the "moving averages" column as the series to lag. We'll use this dialog to create a new column of data that places each moving average down 1 row in the column and inserts missing value symbols, *, at the top of the column.
Now we can create a simple scatterplot that will show if there's a correlation between the observed moving average and the next one.
Clearly, there's a positive correlation between the current moving average and the next, which means we can use the current moving average to predict the next one.
But wait a minute...this is random data! By definition, you can't predict random, so how can there be a correlation? This is getting kind of creepy...it's like there's some kind of ghost in this data.
Zoinks! What would Scooby Doo make of all this?
Debunking the "Ghost" with the Slutsky-Yule Effect
Don't panic—there's a perfectly rational explanation for what we're seeing here. It's called the Slutsky-Yule Effect, which simply says an autoregressive time series (like a moving average) can look like patterned data, even if there's no relationship among the data points.
So there's no ghost in our random data; instead, we're seeing a sort of statistical illusion. Using the moving average can make it seem like a pattern or relationship exists, but that apparent pattern could be a side effect of the tool, and not an indication of a real pattern.
Does this mean you shouldn't use moving averages to look at your data? No! It's a very valuable and useful technique. However, using it carelessly could get you into trouble. And if you're basing a major decision solely on moving averages, you might want to try some alternate approaches, too. Mikel Harry, one of the originators of Six Sigma, has a great blog post that presents a workplace example of how far apart reality and moving averages can be.
So just remember the Slutsky-Yule Effect when you're analyzing data in the dead of night, and your moving average chart shows something frightening. Shed some more light on the subject with follow-up analysis and you might find there's nothing to fear at all.