dcsimg
 

How the Central Limit Theorem Works

We use statistics because it's usually not practical to collect all of the data from an entire population. Instead, we sample the population, and then use statistics for that random sample to draw conclusions about the whole population.

Many common statistical procedures require data to be approximately normal -- in other words, to roughly follow the bell curve.  But what happens when you have a population that, well, just isn't normal?  

That's where the central limit theorem comes in.  It can be a difficult concept to grasp, but at root the central limit theorem says that if you have a sufficient number of randomly selected, independent samples (or observations), the means of those samples will follow a normal distribution -- even if the population you're sampling from does not! 

Here's a quick illustration. The following histogram created in Minitab shows that, because the odds of landing on all sides of a six-sided die are equal, the distribution of 500 die rolls is basically flat: 
 

Histogram of 500 Die Rolls

 


But if we roll the die 1,000 times, then look at the means of every 5 die rolls, the resulting histogram looks very different:

 

 

 

 

Histogram of the Averages of 5 Die Rolls


The population we are drawing from -- a collection of die rolls -- is still flat.  But the distribution of these sample means follows a bell curve. 

And that means that we can apply statistical techniques that assume normality even when we're sampling populations that are strongly nonnormal. It's no exaggeration to say that the central limit theorem provides the foundation for statistical and data analysis.

Want to explore the central limit theorem further -- and see how it works in action?  A colleague and I wrote an article for minitab.com that explains the central limit theorem and shows how to demonstrate it using common examples, including the roll of a die and the birthdays of Major League Baseball players.  We also include directions for using Minitab Statistical Software to perform these demonstrations. 

 

 

 

7 Deadly Statistical Sins Even the Experts Make

Do you know how to avoid them?

Get the facts >

Comments

Name: Travis • Friday, April 15, 2011

Thanks Eston! I wasn't clear on the usefulness of the central limit theorem, but this really helped.


Name: george john • Saturday, December 21, 2013

Good clear explanation.
Just a query - will the variances of multiple samples (as opposed to the mean as per the central limit theorem) also follow a normal distribution?


Name: Eston Martz • Thursday, January 2, 2014

Thanks for the kind words, George -- and what an interesting question! The answer is no -- the CLT applies only to the mean, and you cannot apply it to the variances of multiple samples, which follow a chi-square distribution with n-1 degrees of freedom.


blog comments powered by Disqus