Which Statistical Error Is Worse: Type 1 or Type 2?

People can make mistakes when they test a hypothesis with statistical analysis. Specifically, they can make either Type I or Type II errors.

As you analyze your own data and test hypotheses, understanding the difference between Type I and Type II errors is extremely important, because there's a risk of making each type of error in every analysis, and the amount of risk is in your control.

So if you're testing a hypothesis about a safety or quality issue that could affect people's lives, or a project that might save your business millions of dollars, which type of error has more serious or costly consequences? Is there one type of error that's more important to control than another?

Before we attempt to answer that question, let's review what these errors are.

The Null Hypothesis and Type 1 and 2 Errors

When statisticians refer to Type I and Type II errors, we're talking about the two ways we can make a mistake regarding the null hypothesis (Ho). The null hypothesis is the default position, akin to the idea of "innocent until proven guilty." We begin any hypothesis test with the assumption that the null hypothesis is correct.

We commit a Type 1 error if we reject the null hypothesis when it is true. This is a false positive, like a fire alarm that rings when there's no fire.

A Type 2 error happens if we fail to reject the null when it is not true. This is a false negative—like an alarm that fails to sound when there is a fire.

It's easier to understand in the table below, which you'll see a version of in every statistical textbook:

 Reality Null (H0) not rejected Null (H0) rejected Null (H0) is true. Correct conclusion. Type 1 error Null (H0) is false. Type 2 error Correct conclusion.

These errors relate to the statistical concepts of risk, significance, and power.

Reducing the Risk of Statistical Errors

Statisticians call the risk, or probability, of making a Type I error "alpha," aka "significance level." In other words, it's your willingness to risk rejecting the null when it's true. Alpha is commonly set at 0.05, which is a 5 percent chance of rejecting the null when it is true. The lower the alpha, the less your risk of rejecting the null incorrectly. In life-or-death situations, for example, an alpha of 0.01 reduces the chance of a Type I error to just 1 percent.

A Type 2 error relates to the concept of "power," and the probability of making this error is referred to as "beta." We can reduce our risk of making a Type II error by making sure our test has enough power—which depends on whether the sample size is sufficiently large to detect a difference when it exists.

The Default Argument for "Which Error Is Worse"

Let's return to the question of which error, Type 1 or Type 2, is worse. The go-to example to help people think about this is a defendant accused of a crime that demands an extremely harsh sentence.

The null hypothesis is that the defendant is innocent. Of course you wouldn't want to let a guilty person off the hook, but most people would say that sentencing an innocent person to such punishment is a worse consequence.

Hence, many textbooks and instructors will say that the Type 1 (false positive) is worse than a Type 2 (false negative) error. The rationale boils down to the idea that if you stick to the status quo or default assumption, at least you're not making things worse.

And in many cases, that's true. But like so much in statistics, in application it's not really so black or white. The analogy of the defendant is great for teaching the concept, but when we try to make it a rule of thumb for which type of error is worse in practice, it falls apart.

So Which Type of Error Is Worse, Already?

I'm sorry to disappoint you, but as with so many things in life and statistics, the honest answer to this question has to be, "It depends."

In one instance, the Type I error may have consequences that are less acceptable than those from a Type II error. In another, the Type II error could be less costly than a Type I error. And sometimes, as Dan Smith pointed out in Significance a few years back with respect to Six Sigma and quality improvement, "neither" is the only answer to which error is worse:

Most Six Sigma students are going to use the skills they learn in the context of business. In business, whether we cost a company \$3 million by suggesting an alternative process when there is nothing wrong with the current process or we fail to realize \$3 million in gains when we should switch to a new process but fail to do so, the end result is the same. The company failed to capture \$3 million in additional revenue.

Look at the Potential Consequences

Since there's not a clear rule of thumb about whether Type 1 or Type 2 errors are worse, our best option when using data to test a hypothesis is to look very carefully at the fallout that might follow both kinds of errors. Several experts suggest using a table like the one below to detail the consequences for a Type 1 and a Type 2 error in your particular analysis.

 Null Type 1 Error: H0 true, but rejected Type 2 Error: H0 false, but not rejected Medicine A does not relieve Condition B. Medicine A does not relieve Condition B, but is not eliminated as a treatment option. Medicine A relieves Condition B, but is eliminated as a treatment option. Consequences Patients with Condition B who receive Medicine A get no relief. They may experience worsening condition and/or side effects, up to and including death. Litigation possible. A viable treatment remains unavailable to patients with Condition B. Development costs are lost. Profit potential is eliminated.

Whatever your analysis involves, understanding the difference between Type 1 and Type 2 errors, and considering and mitigating their respective risks as appropriate, is always wise. For each type of error, make sure you've answered this question: "What's the worst that could happen?"

To explore this topic further, check out this article on using power and sample size calculations to balance your risk of a type 2 error and testing costs, or this blog post about considering the appropriate alpha for your particular test.