T for 2. Should I Use a Paired t or a 2-sample t?

teaBoxers or briefs.

Republican or Democrat.

Yin or yang.

Why is it that life often seems to boil down to two choices?

Heck, it even happens when you open the Basic Stats menu in Minitab. You’ll see a choice between a 2-sample t-test and a paired t-test:

t menu

Which test should you choose? And what’s at stake?

Ask a statistician, and you might get this response: "Elementary, my dear Watson. Choose the 2-sample t-test to test the difference in two means: H0: µ1 – µ2 = 0 Choose the paired t-test to test the mean of pairwise differences H0: µd = 0."

You gaze at your two sets of data values, mystified. Do you have to master the Greek alphabet to choose the right test?

όχι !

(That's Greek for “no”)

Base Your Decision on How the Data Is Collected

Dependent samples: If you collect two measurements on each item, each person, or each experimental unit, then each pair of observations is closely related, or matched.

For example, suppose you test the computer skills of participants before and after they complete a computer training course. That  produces a set of paired observations (Before and After test scores) for each participant.

paired data

In that case, you should use the paired t-test to test the mean difference between these dependent observations.  (See the orange guy at left shaped like a lopsided peanut? That’s me. Training failed to improve my computer graphics skills. Hence the cheesy 1990s ClipArt). 

Paired observations can also arise when you measure two different items subject to the same unique condition.  

For example, suppose you measure tread wear on two different types of bike tires by putting both tires on the same bicycle. Then each bike is ridden by a different rider. To compare 20 pairs of tires, you use 20 different bicycles/riders.

Because each bicycle is ridden different distances in different conditions, measuring the tread wear for the two tires on each bike produces a set of paired (dependent) measurements. To account for the unique conditions that each bike was subject to, you’d use a paired t-test to evaluate the differences in mean tread wear.

Independent samples: If you randomly sample each set of items separately, under different conditions, the samples are independent. The measurements in one sample have no bearing on the measurements in the other sample.

Suppose you randomly sample two different groups of people and test their computer skills. You take one random sample from people who have not taken a computer training course and record their test scores. You take a second random sample from another group of people who have completed the computer training course and record their test scores.


Because the two samples are independent, you must use the 2-sample t test to compare the difference in the means.

If you use the paired t test for these data, Minitab assumes that the before and after scores are paired: The 47 score before training is associated with a 53 score after training.  A 92 score before training is associated with a 71 score after training, etc. 

You could end up pairing Mark Zuckerberg’s Before test score with Lloyd Christmas’ After test score.

Invalid pairings like that can lead to very erroneous conclusions.

Paired vs 2-Sample Designs

If you’re planning your study and haven’t collected data yet, be aware of the possible ramifications of using 2-sample vs a paired design. The difference in the designs could drastically affect the amount of data you'll need to collect.

For example, suppose you design your study to measure the test  scores of the same 15 participants before and after they complete a computer training course. The paired t-test test gives the following results:

Paired T-Test and CI: Before, After
Paired T for Before - After

            N   Mean  StDev  SE Mean
Before     15  97.07  26.88  6.94
After      15 101.60  27.16  7.01
Difference 15 -4.533  3.720  0.960

95% CI for mean difference: (-6.593, -2.473)
T-Test of mean difference = 0 (vs not = 0): T-Value = -4.72 P-Value = 0.000

Because the p-value (0.000) is less than alpha (0.05), you conclude that the mean difference between the Before and After test scores is statistically significant.

Now suppose instead you had designed a study to collect two independent samples: 1) the test scores from 15 people who had not completed computer training (Before) and 2) the tests scores from 15 different people who had completed the computer training (After).

For the sake of argument let's suppose you wind up with the same exact data values for the Before and After scores that you did with the paired design. Here's what you obtain when you analyze the data using the 2-sample t test.

Two-Sample T-Test and CI: Before, After
Two-sample T for Before vs After

        N   Mean  StDev  SE Mean
Before 15   97.1  26.9   6.9
After  15  101.6  27.2   7.0

Difference = mu (Before) - mu (After)
Estimate for difference: -4.53
95% CI for difference: (-24.78, 15.71)
T-Test of difference = 0 (vs not =): T-Value = -0.46 P-Value = 0.650 DF = 27

The sample size, the standard deviation, and the estimated difference between the means are exactly the same for both tests. But note the whopping difference in p-values—0.000 for the paired t-test and 0.650 for the 2-sample t-test.

Even though the 2-sample design required twice as many subjects (30) as the paired design (15), you can’t conclude there’s a statistically significant difference between the means of the Before and After test scores.

What’s going on? Why the huge disparity in results?

A Paired Design Reduces Experimental Error

By accounting for the variability caused by different items, subjects, or conditions, and thereby reducing experimental  error, the paired design  tends to increase the signal-to-noise ratio that determines statistical significance. This can result in a more efficient design that requires less resources to detect a significant difference, if one exists.

Because 2-sample design doesn’t control for the variability of the experimental unit, a much larger sample is needed to achieve statistical significance for a given difference and variability in the data, as shown below:

Two-Sample T-Test and CI: Before, After

Two-sample T for Before vs After

         N   Mean   StDev  SE Mean
Before  270   97.1  26.0   1.6
After   270  101.6  26.3   1.6

Difference = mu (Before) - mu (After)
Estimate for difference: -4.53|
95% CI for difference: (-8.95, -0.11)
T-Test of difference = 0 (vs not =): T-Value = -2.01 P-Value = 0.045 DF = 537

Remember, these are independent samples. So this translates to 270 + 270 = 540 subjects for the study—compared to only 15 subjects in the paired design!

That gain in efficiency comes from controlling for person-to-person variability -- a good thing to do  because that variability is not a primary objective of this study. It's a nuisance factor, something that creates “extraneous noise” that gets in the way of “hearing” the main effect that you're most interested in.

So next time you’re planning T for 2, give it a hard think.

If it’s possible to satisfy your objectives using  a paired t  design rather than a 2-sample t design, you may be wise to do so.

Note: Click here to download the Minitab project file with the sample data used in these examples.

7 Deadly Statistical Sins Even the Experts Make

Do you know how to avoid them?

Get the facts >


Name: walid • Thursday, June 26, 2014

This is very interesting article. Thank you. I would ask if I have to sets of measurements values each set obtained from different device and each device does a different approach (algorithm), and I need to validate that there is no statistical difference between the results obtained from the first device and the 2nd device and hence both approaches are valid.


Name: Patrick • Tuesday, July 15, 2014

If I understand your question correctly, you're asking whether you should use a paired t test or a 2-sample t test to compare the measurements taken from two different devices. If you randomly sample different sets of items for each device and then measure a different set of items with each device, you'd use a 2-sample t test (or 2-sample equivalence test) to compare the means. However, if you measure the same item or person twice--once with each measuring device--you'd use a paired t test (or paired equivalence test) to compare the mean measurements. If you have paired data, you might also want to perform a correlation analysis in addition to a paired t test. (Stat > Basic Statistics > Correlation)

Alternatively, to more rigorously demonstrate the equivalence of two measuring devices, you could use orthogonal regression. Orthogonal regression is a more rigorous analysis than a t-test for this type of application, but it does require you to know the error variance ratio. For more information, choose Stat > Regression > Orthogonal Regression, then click Help. Then click Example.

Thanks for your comment and question!

Name: Arpit Loya • Monday, August 11, 2014

Dear Patrick

This was a very interesting and thoughtful article that cleared my doubts of t-test.

If you could please clarify on the following hypothesis, which test would be most appropriate:
There is a stronger association between the usage of an outside audit and market share, than there is between the usage of other methods of conducting a marketing audit and market share( self audit, audit from across, company task force audit, company auditing office, computer driven audit)

Name: patrick • Wednesday, August 13, 2014

Hello Arpit,

Thank you for your kind comment. Although I can't offer statistical guidance and recommendations for specific applications of readers (it would be too difficult and fraught with peril to do so in a comments forum like this, without knowing details of data collection and so on).

But, let me try to answer your question in a general way.

If you want to compare the mean values of data samples for two or more groups (such as mean value for market share using outside audit, mean value of market share using self-audit, mean value of market share using company task force audit, and so on) you would use ANOVA, rather than a t-test. In Minitab, choose Stat > ANOVA > One Way. Click Help, then click Example to see how you would set up and perform the ANOVA. There is also an ANOVA example in Help > StatGuide.

Hope this helps!

Best, Patrick

Name: Arpit Loya • Monday, September 1, 2014

Dear Patrick,
Thank you for the reply and guidance.

Arpit Loya

blog comments powered by Disqus