Gnome Power (and Sample Size)

Minitab Blog Editor 08 December, 2011

GnomeI was playing around with the power and sample size graphs in Minitab recently, and I noticed something interesting. Power, for the uninitiated, is usually described as the likelihood that you will find a significant effect or difference when one truly exists. There is a lot of good content on Power in the Minitab Help, StatGuide and Glossary. In any case, rather than simply describe what I found, I thought I’d invent a completely contrived, obviously fabricated, and wildly unrealistic example to illustrate. You're welcome.

Meet Bob. Bob works for the company that leads the nation in the production and sale of high quality garden gnomes. You know the one. The other day, Bob was in his office, admiring the prototypes for the upcoming 2012 North American Gnome Show. He particularly liked the new Ignatius Von Gnomenberger with Fallen Lederhosen (Catalog number, 57-MOON-OOPS).

Suddenly, Bob was jarred from his reverie by the ringing of his phone. On the other end of the line was a salesperson claiming that he had a new formulation of porcelain that would “revolutionize the gnome industry!” The salesperson insisted that Bob owed it to himself to give it a try because it was much stronger and more chip resistant than the “inferior toilet-grade porcelain” that Bob was currently using. The salesperson offered to ship enough mix to make 12 gnomes, free of charge. Bob said he’d get back to him.

Bob doesn’t trust salespeople, but Bob—like so many others—trusts Minitab.

Bob knew that gnomes made with the new porcelain would need to be tested to see how many GNUs (Gnome Nicking Units) were required to chip the surface. He also knew the GNUs required to chip his current gnomes, and he knew that the standard deviation for the current gnomes was 3.4.

Bob wanted to know how big a difference they could detect conducting a 1-Sample t test with a sample of 12 gnomes.

He opened up Minitab Statistical Software (which he loves almost as much as the 57-MOON-OOPS), chose Stat > Power and Sample Size > 1-Sample t, and entered the following:

Power and Sample Size for 1-Sample t dialog box Power and Sample Size Options dialog box

The result was a typical power curve, just like others he had created before:

Power Curve for 1-Sample t Test
At the left side of the graph, the difference was 0. The curve showed that the power increased as the difference increased, reflecting the fact that if the difference is bigger, you are more likely to get a significant result from your test:




  Size  Power  Difference
    12    0.9     3.06907




With a sample of 12 gnomes, Bob would have a power of 0.9 to detect a difference of about 3.07 GNUs.

Being intimately acquainted with his gnomes, Bob kn ew that a difference of 3.07 wasn’t particularly impressive. He wondered how many gnomes he would have to test to detect a difference of say, 4.5 GNUs. When he attempted to find out, something strange happened.

Bob accidentally typed a minus sign instead of a 4. Before he could stop himself he had already clicked OK:

Oops. Power for negative .5?
The results astounded and confused poor Bob:
The Twilight Zone ?!


Difference    Size      Power
      -0.5      12  0.0168213

How could the graph show power values for differences of less than zero?!  His black belts had explained long ago that when you choose “Greater than” as your alternative hypothesis, you have no power to detect a negative difference.

And yet Bob knew the results could not be wrong, because Bob—like so many others—trusts Minitab.

It was as if his errant key stroke had suddenly opened a portal into a strange new world, a world where power is the chance that you will be wrong, a world where porcelain gnomes might come to life, a world where Ignatius Von Gnomenberger might finally be able to correct his perpetual wardrobe malfunction...

Then Bob noticed that the power value at a difference of zero was equal to his alpha of 0.05. That actually makes sense because an alpha of 0.05 means that even if there isn’t a difference between porcelain formulations, 5 times out of 100 you will still get an unusual sample that will make it look like there is a difference. That’s the classic “Type 1 Error” you hear about.

With that in mind, Bob realized what the "Twilight Zone" part of the curve was telling him. The weaker the “revolutionary" new porcelain, the less chance he will mistakenly conclude that it is actually stronger than his tried and true porcelain. The power curve simply shows that as the new porcelain becomes weaker and weaker, the chance of making that mistake gets closer and closer to 0.

Bob had unexpectedly learned an important lesson. His life, his work, and his gnomes would never be the same.