We use statistics because it's usually not practical to collect all of the data from an entire population. Instead, we sample the population, and then use statistics for that random sample to draw conclusions about the whole population.
Many common statistical procedures require data to be approximately normal -- in other words, to roughly follow the bell curve. But what happens when you have a population that, well, just isn't normal?
That's where the central limit theorem comes in. It can be a difficult concept to grasp, but at root the central limit theorem says that if you have a sufficient number of randomly selected, independent samples (or observations), the means of those samples will follow a normal distribution -- even if the population you're sampling from does not!
Here's a quick illustration. The following histogram created in Minitab shows that, because the odds of landing on all sides of a six-sided die are equal, the distribution of 500 die rolls is basically flat:
But if we roll the die 1,000 times, then look at the means of every 5 die rolls, the resulting histogram looks very different:
The population we are drawing from -- a collection of die rolls -- is still flat. But the distribution of these sample means follows a bell curve.
And that means that we can apply statistical techniques that assume normality even when we're sampling populations that are strongly nonnormal. It's no exaggeration to say that the central limit theorem provides the foundation for statistical and data analysis.
Want to explore the central limit theorem further -- and see how it works in action? A colleague and I wrote an article for minitab.com that explains the central limit theorem and shows how to demonstrate it using common examples, including the roll of a die and the birthdays of Major League Baseball players. We also include directions for using Minitab Statistical Software to perform these demonstrations.