Explaining the Central Limit Theorem with Bunnies & Dragons
When I think about the Central Limit Theorem (CLT), bunnies and dragons are just about the last things that come to mind. However, that’s not the case for Shuyi Chiou, whose playful CreatureCast.org animation explains the CLT using both fluffy and fire-breathing creatures.
Per the article that accompanied this video in The New York Times:
“Many real-world observations can be approximated by, and tested against, the same expected pattern: the normal distribution. In this familiar symmetric bell-shaped pattern, most observations are close to average, and there are fewer observations further from the average. The size of flowers, the physiological response to a drug, the breaking force in a batch of steel cables — these and other observations often fit a normal distribution.
There are, however, many important things we would like to measure and test that do not follow a normal distribution. Household income doesn’t – high values are much further from the average than low values are.
But even when raw data does not fit a normal distribution, there is often a normal distribution lurking within it. This makes it possible to still use the normal distribution to test ideas about non-normal data.”
The CLT Defined
Some data sets follow a normal distribution. Others do not. However for both normal AND nonnormal data, if we repeatedly take independent random samples of size n from a population, then when n is large, the distribution of the sample means will approach a normal distribution.
How large is large enough? Well, it depends. The closer the population distribution is to a normal distribution, the fewer samples you need to take to demonstrate the theorem. Populations that are heavily skewed or have several modes may require larger sample sizes.
Want to See the CLT in Action?
In case you don’t have any fire-breathing dragons at your disposal, you can use Minitab Statistical Software to simulate the scenarios the animation describes and see the CLT in action.
Let’s use household income, which, as The New York Times article states, does not follow a normal distribution. Suppose that instead of a normal distribution, household income follows a Weibull distribution with a shape of 1.5 and a scale of 50. Shape? Scale? What’s that, you say? If you’re not familiar with the Weibull distribution and its shape and scale parameters, all you need to know is that this distribution is not bell-shaped. Instead it looks something like this:
So let’s see the CLT in action. Using Minitab’s Calc > Random Data > Weibull menu, we can randomly sample data from this Weibull distribution. We can then use Calc > Row Statistics to calculate the average of each sample.
(For more information about using Minitab’s Calc menu to demonstrate the Central Limit Theorem, one of our articles on minitab.com offers detailed instructions on how to simulate the central limit theorem using dice and birthdays.)
Suppose we collect a sample of size 5 from that Weibull distribution above and compute the average of those 5 numbers. Then we repeat this process, say, 300 times. Looking at the green histogram of those 300 averages, we can see the sample means do not appear to follow a normal distribution. Rather, the means are skewed to the right.
Suppose we then collect a sample size of 50 from that same Weibull distribution and compute the average of those 50 numbers. Then repeat that process 300 times. We see in the purple histogram below that the distribution of those 300 averages does in fact resemble a normal distribution.
As Shuyi Chiou notes, “There is something very special about the normal distribution." Not only do we see this distribution when describing all types of measurements in nature (e.g., the weights of bunnies), on the factory floor, etc., but it also can be used to describe the distribution of the means of measurements "even when the variables themselves don’t have a normal distribution, as we saw with dragon wings. Because of this, we can use the normal distribution to test ideas about the world even when the underlying variables do not follow a normal distribution.”