T for 2. Should I Use a Paired t or a 2-sample t?

Minitab Blog Editor | 7/8/2013

Topics: Hypothesis Testing

Boxers or briefs.

Republican or Democrat.

Yin or yang.

Why is it that life often seems to boil down to two choices?

Heck, it even happens when you open the Basic Stats menu in Minitab. You’ll see a choice between a 2-sample t-test and a paired t-test:

t menu

Which test should you choose? And what’s at stake?

Ask a statistician, and you might get this response: "Elementary, my dear Watson. Choose the 2-sample t-test to test the difference in two means: H0: µ1 – µ2 = 0 Choose the paired t-test to test the mean of pairwise differences H0: µd = 0."

You gaze at your two sets of data values, mystified. Do you have to master the Greek alphabet to choose the right test?

όχι !

(That's Greek for “no”)

Base Your Decision on How the Data Is Collected

Dependent samples: If you collect two measurements on each item, each person, or each experimental unit, then each pair of observations is closely related, or matched.

For example, suppose you test the computer skills of participants before and after they complete a computer training course. That  produces a set of paired observations (Before and After test scores) for each participant.

paired data

In that case, you should use the paired t-test to test the mean difference between these dependent observations.  (See the orange guy at left shaped like a lopsided peanut? That’s me. Training failed to improve my computer graphics skills. Hence the cheesy 1990s ClipArt). 

Paired observations can also arise when you measure two different items subject to the same unique condition.  

For example, suppose you measure tread wear on two different types of bike tires by putting both tires on the same bicycle. Then each bike is ridden by a different rider. To compare 20 pairs of tires, you use 20 different bicycles/riders.

Because each bicycle is ridden different distances in different conditions, measuring the tread wear for the two tires on each bike produces a set of paired (dependent) measurements. To account for the unique conditions that each bike was subject to, you’d use a paired t-test to evaluate the differences in mean tread wear.

Independent samples: If you randomly sample each set of items separately, under different conditions, the samples are independent. The measurements in one sample have no bearing on the measurements in the other sample.

Suppose you randomly sample two different groups of people and test their computer skills. You take one random sample from people who have not taken a computer training course and record their test scores. You take a second random sample from another group of people who have completed the computer training course and record their test scores.

unpaired

Because the two samples are independent, you must use the 2-sample t test to compare the difference in the means.

If you use the paired t test for these data, Minitab assumes that the before and after scores are paired: The 47 score before training is associated with a 53 score after training.  A 92 score before training is associated with a 71 score after training, etc. 

You could end up pairing Mark Zuckerberg’s Before test score with Lloyd Christmas’ After test score.

Invalid pairings like that can lead to very erroneous conclusions.

Paired vs 2-Sample Designs

If you’re planning your study and haven’t collected data yet, be aware of the possible ramifications of using 2-sample vs a paired design. The difference in the designs could drastically affect the amount of data you'll need to collect.

For example, suppose you design your study to measure the test  scores of the same 15 participants before and after they complete a computer training course. The paired t-test test gives the following results:

Paired T-Test and CI: Before, After
Paired T for Before - After

            N   Mean  StDev  SE Mean
Before     15  97.07  26.88  6.94
After      15 101.60  27.16  7.01
Difference 15 -4.533  3.720  0.960

95% CI for mean difference: (-6.593, -2.473)
T-Test of mean difference = 0 (vs not = 0): T-Value = -4.72 P-Value = 0.000

Because the p-value (0.000) is less than alpha (0.05), you conclude that the mean difference between the Before and After test scores is statistically significant.

Now suppose instead you had designed a study to collect two independent samples: 1) the test scores from 15 people who had not completed computer training (Before) and 2) the tests scores from 15 different people who had completed the computer training (After).

For the sake of argument let's suppose you wind up with the same exact data values for the Before and After scores that you did with the paired design. Here's what you obtain when you analyze the data using the 2-sample t test.

Two-Sample T-Test and CI: Before, After
Two-sample T for Before vs After

        N   Mean  StDev  SE Mean
Before 15   97.1  26.9   6.9
After  15  101.6  27.2   7.0

Difference = mu (Before) - mu (After)
Estimate for difference: -4.53
95% CI for difference: (-24.78, 15.71)
T-Test of difference = 0 (vs not =): T-Value = -0.46 P-Value = 0.650 DF = 27

The sample size, the standard deviation, and the estimated difference between the means are exactly the same for both tests. But note the whopping difference in p-values—0.000 for the paired t-test and 0.650 for the 2-sample t-test.

Even though the 2-sample design required twice as many subjects (30) as the paired design (15), you can’t conclude there’s a statistically significant difference between the means of the Before and After test scores.

What’s going on? Why the huge disparity in results?

A Paired Design Reduces Experimental Error

By accounting for the variability caused by different items, subjects, or conditions, and thereby reducing experimental  error, the paired design  tends to increase the signal-to-noise ratio that determines statistical significance. This can result in a more efficient design that requires less resources to detect a significant difference, if one exists.

Because 2-sample design doesn’t control for the variability of the experimental unit, a much larger sample is needed to achieve statistical significance for a given difference and variability in the data, as shown below:

Two-Sample T-Test and CI: Before, After

Two-sample T for Before vs After

         N   Mean   StDev  SE Mean
Before  270   97.1  26.0   1.6
After   270  101.6  26.3   1.6

Difference = mu (Before) - mu (After)
Estimate for difference: -4.53|
95% CI for difference: (-8.95, -0.11)
T-Test of difference = 0 (vs not =): T-Value = -2.01 P-Value = 0.045 DF = 537

Remember, these are independent samples. So this translates to 270 + 270 = 540 subjects for the study—compared to only 15 subjects in the paired design!

That gain in efficiency comes from controlling for person-to-person variability -- a good thing to do  because that variability is not a primary objective of this study. It's a nuisance factor, something that creates “extraneous noise” that gets in the way of “hearing” the main effect that you're most interested in.

So next time you’re planning T for 2, give it a hard think.

If it’s possible to satisfy your objectives using  a paired t  design rather than a 2-sample t design, you may be wise to do so.

Note: Click here to download the Minitab project file with the sample data used in these examples.