Alphas, P-Values, and Confidence Intervals, Oh My!

Trying to remember what the alpha-level, p-value, and confidence interval all mean for a hypothesis test—and how they relate to one another—can seem about as daunting as Dorothy’s trek down the yellow brick road.

Rather than sitting through a semester of Intro Stats, let's get right to the point and explain in clear language what all these statistical terms mean and how they relate to one another.

What Does Alpha Mean in a Hypothesis Test?

Before you run any statistical test, you must first determine your alpha level, which is also called the “significance level.” By definition, the alpha level is the probability of rejecting the null hypothesis when the null hypothesis is true.

Translation: It’s the probability of making a wrong decision.

Thanks to famed statistician R. A. Fisher, most folks typically use an alpha level of 0.05. However, if you’re analyzing airplane engine failures, you may want to lower the probability of making a wrong decision and use a smaller alpha. On the other hand, if you're making paper airplanes, you might be willing to increase alpha and accept the higher risk of making the wrong decision.

Like all probabilities, alpha ranges from 0 to 1.

What Is the P-Value of a Hypothesis Test?

Once you’ve chosen alpha, you’re ready to conduct your hypothesis test. Suppose you want to run a 1-sample t-test to determine whether or not the average price of Cairn terriers—like Dorothy’s dog Toto—is equal to, say, \$400. You collect your sample data, put it in Minitab Statistical Software, and then arrive at your p-value.

Statistically speaking, the p-value is the probability of obtaining a result as extreme as, or more extreme than, the result actually obtained when the null hypothesis is true. If that makes your head spin like Dorothy’s house in a Kansas tornado, just pretend Glenda has waved her magic wand and zapped it from your memory. Then ponder this for a moment.

The p-value is basically the probability of obtaining your sample data IF the null hypothesis (e.g., the average cost of Cairn terriers = \$400) were true. So if you obtain a p-value of 0.85, then you have little reason to doubt the null hypothesis. However, if your p-value is say 0.02, there’s only a very small chance you would have obtained that data if the null hypothesis was in fact true.

And since the p-value is a probability just like alpha, p-values also range from 0 to 1.

What Is the Confidence Interval for a Hypothesis Test?

When you run a hypothesis test, Minitab also provides a confidence interval. P-values and confidence intervals are like Dorothy and Toto—where you find one, you will likely find the other.

The confidence interval is the range of likely values for a population parameter, such as the population mean. For example, if you compute a 95% confidence interval for the average price of a Cairn terrier, then you can be 95% confident that the interval contains the true average cost of all Cairn terriers.

Interpreting Hypothesis Test Statistics

Now let's put it all together. These three facts should help you interpret the results of your hypothesis test.

Fact 1: Confidence level + alpha = 1

If alpha equals 0.05, then your confidence level is 0.95. If you increase alpha, you both increase the probability of incorrectly rejecting the null hypothesis and also decrease your confidence level.

Fact 2: If the p-value is low, the null must go.

If the p-value is less than alpha—the risk you’re willing to take of making a wrong decision—then you reject the null hypothesis. For example, if the p-value was 0.02 (as in the Minitab output below) and we're using an alpha of 0.05, we’d reject the null hypothesis and conclude that the average price of Cairn terrier is NOT \$400.

If the p-value is low, the null must go. Alternatively, if the p-value is greater than alpha, then we fail to reject the null hypothesis. Or, to put it another way, if the p-value is high, the null will fly.

Fact 3: The confidence interval and p-value will always lead you to the same conclusion.

If the p-value is less than alpha (i.e., it is significant), then the confidence interval will NOT contain the hypothesized mean. Looking at the Minitab output above, the 95% confidence interval of 365.58 - 396.75 does not include \$400. Thus, we know that the p-value will be less than 0.05.

If the p-value is greater than alpha (i.e., it is not significant), then the confidence interval will include the hypothesized mean.

I hope this post has helped to lift the curtain if you've had questions regarding alpha, the p-value, confidence intervals, and how they all relate to one another. If you want more details about these statistical terms and hypothesis testing, I’d recommend giving Quality Trainer a try. Quality Trainer is Minitab’s e-learning course that teaches you both statistical concepts and how to analyze your data using Minitab, and at a cost of only \$30 US for one month, it’s well worth the investment.

Name: Dave Blundell • Monday, October 1, 2012

Hi Michelle,

Your statement on the confidence interval "..., then you can be 95% confident that the interval contains the true average cost of all Cairn terriers."

Is it true that the 95% CI contains the true average even though it's based on a single confidence interval? If this test was repeated 100 times with calculation of 95% CI for each random sample (n = 30), then 95 out of the 100 confidence intervals would include the true average with the other 5% excluding the true average?

Best regards,

Dave

Name: Michelle • Tuesday, October 2, 2012

Dave, great question.

Although the confidence interval provides a range of likely values for the population mean, there's still a 5% chance that the CI doesn't include the true average.

Your last question provides a great demonstration of the principle - if you take 100 samples and calculate the CI for each sample, then 95 of those 100 CIs will contain the true population mean, while 5 will not. I pondered providing this explanation in my post as well, but thought it was easier to demonstrate in person than trying to explain in it words. Perhaps I should have included it though.

Michelle

Name: Saeed Akhtar • Tuesday, November 20, 2012

Hi Dave,

Actually, in statistics, there is no such term as''true average''. What I understand from your note that you are trying to imply that the mean of means (of several samples) tends to converge (or estimate more closely) to the true unknown population mean.

I hope it helps.

Thanks

Name: kicab • Sunday, December 2, 2012

There is a critical nuance in Dave's question.

The process or formula for a 95% CI is a random variable in which 95% of the CIs will include the parameter value it is estimating. A specific 95% CI calculated from a single sample will either include that value or not--the probability is 100% or 0%, respectively. We just don't know which. That's why it is called a confidence interval and not a probability interval. (Unless we are speaking of Bayesian statistics--another topic.)

Confidence intervals do not converge to the population parameter except by increasing the sample size. This is true regardless of the confidence level for the CI. Both a 5% and a 95% CI will converge to the population mean as n increases. When n = total population, there is no sampling error so the 5% and 95% CIs are equal.

Name: kicab • Sunday, December 2, 2012

I think the statement "Translation: It’s the probability of making a wrong decision." is incomplete.

Alpha is the probability of making a wrong decision IF the null hypothesis is true. There is no wrong decision when the null hypothesis is false and we reject if.

Name: Ravi • Thursday, December 6, 2012

Hi Miclelle,

I shall be grateful if you kindly clarify upon a question that I have.
Quote from the article above:
"if you’re analyzing airplane engine failures, you may want to lower the probability of making a wrong decision and use a smaller alpha. On the other hand, if you're making paper airplanes, you might be willing to increase alpha and accept the higher risk of making the wrong decision."

If I am testing an engine of a real aircraft to assess the power that it delivers is the same as the rated power of the engine, my Ho: would be "The power of the engine under test is the same as the rated power of the engine". For a small sample of engines that I test, I would do a t-test and use the square root of sample size in comparing average of the sample with the rated power. Now with a 95% confidence (5% significance) I would accept a significantly lower power as acceptable, as I would fail to reject the null hypothesis for a p-value higher than 0.05. As against this I would accept a narrower band of variation with a 80% confidence ( 20% significance). I would fail to reject the null hypothesis only if the p-value goes above 0.2. This way I would be able to make a better (safer) decision about the average engine power of a real aircraft at 80% confidence (20% significance) than at 95% confidence (5% significance).

This is contrary to what you have stated above about using high confidence to make a better decision. I would rather have a low confidence level and make a better decision with the logic explained above. Thanks in advance.

Name: Michelle Paret • Friday, December 7, 2012

Hi Ravi,

If you increase alpha, then yes, you will increase the power of the test. However, there's a trade off - although increasing alpha increases the probability of detecting a difference when one exists (r.e. power), it also increases the chance of rejecting the null when it is actually true (r.e. alpha). Per the latter, you could therefore conclude that a process is NOT on target when in fact it is.

Typically, the preference is to be more conservative with alpha than beta (1-power). However, the values you choose are of course ultimately up to you.

I hope this helps to claify the point I was trying to make, and thank you for sharing your thoughts on the topic.

Michelle

Name: Sarah • Thursday, December 26, 2013

why does the p-value should be less OR equal to alpha ??

Name: Michelle Paret • Thursday, January 2, 2014

Sarah, technically-speaking a result is statistically significant if the p-value is less than OR equal to alpha. I hope that answers your question.

Name: Jennifer • Saturday, March 1, 2014

Can any statistics wizards help explain this to me in English?!

If our 95% confidence interval includes zero, our p-value is…

greater than .05, less than .05, between .01 and .05, or none of the above.

For some reason I have having quite the time wrapping my brain around this. Any help is greatly appreciated!

Name: Michelle Paret • Tuesday, March 4, 2014

Jennifer, statistical concepts aren't always easy, but I hope this will help. Suppose you're doing a 2-sample t-test to test the equality of 2 means:

"If our 95% confidence interval includes zero, our p-value is GREATER than 0.05".

On the other hand, if zero is NOT within the confidence interval, then there IS a significant (non-zero) difference between the means. Therefore, the p-value will be LESS THAN 0.05.

Name: Angela • Monday, May 5, 2014

Hello Michelle. Thanks for the helpful tips. I am confused on ' p value'. When alpha is kept at .05, and our p value turns out as say .002, it is concluded as "rejection of null hypotheses" does this mean that, we are always committing a type 1 error, each time we are rejecting the null on basis of p value?