How Much Data Do You Really Need? Check Power and Sample Size

Collecting information for data analysis is like tasting fine wine—you want the right amount. Take too small a sip and you won't be able to assess it properly: you won't have enough information! But if you take a giant swig, your palate will be overwhelmed. That amount is just way more than you really need to make a solid recommendation.

So, how big a sip should you take? I'm no wine expert, so don't ask me. But when you need to figure out how much data you need to collect in order to answer a question with some degree of reliability, you need to look at statistical power and sample size.  Power and sample size tools in statistical software like Minitab can tell you how much data you need to be confident in your results.

"Statistical power" refers to the probability that your hypothesis test will find a significant effect when one really exists. The amount of statistical power you need will depend on what you're trying to find out, and how much you're willing to invest to do so. If lives are on the line with the accuracy of your results (for instance, if you're testing airbags in a new vehicle), you'll probably want more statistical power than you would need to, say, assess the difference in softness between two types of toilet paper.

We recently wrote an overview of power and sample size, along with a summary of the tools available for it in Minitab Statistical Software includes power and sample size tools yu can use these tools to answer questions like:

  • How many samples do you need to determine if the average thickness of foam from one supplier differs from that from another?
  • How many individuals should you survey to be 95% confident that the proportion of people who prefer one soft drink over another is within 3% of its true value?
  • Can you trust a hypothesis test that indicates there is insufficient evidence to suggest the average safety records for two groups of drivers are different?
  • How many replicates do you need to run if you want your experiment to have at least a 75% chance of detecting the variables that significantly affect your outcome?

If you're not using power and sample size tools, how do you know you can trust the results of your analyses?



Name: Carmen Pellegrino • Thursday, April 5, 2012

I notice this calculation is not relative to time or population size. How do you factor in how often the calculated sample should be taken? Would you base it on a given time or batch size?

Name: Eston Martz • Thursday, April 5, 2012

Hi Carmen - Thanks for reading and commenting. I'm not sure what calculation you're referring to, but I infer that you're talking about the examples in the tutorial linked above. You're right that the power and sample size calculation doesn't factor in population size or time (as it might in a control chart, for instance). But you don't need to know the size of the population to make inferences about it from a sample, and that's where power and sample size comes in. In the forklift training example, for instance, we don't need to know how many people will take the courses in total: we can still use Minitab to tell us that, if we want to detect a difference of 5 points with 80% power, with a standard deviation of 5, we'll need to sample 17 people from each group of testers. Factors of time or batch size don't come into play for answering this type of question, although they can be important in other types of analyses. I hope this helps! Best regards,

blog comments powered by Disqus