How Much Data Do You Really Need? Check Power and Sample Size

Minitab Blog Editor 20 March, 2012

Collecting information for data analysis is like tasting fine wine—you want the right amount. Take too small a sip and you won't be able to assess it properly: you won't have enough information! But if you take a giant swig, your palate will be overwhelmed. That amount is just way more than you really need to make a solid recommendation.

So, how big a sip should you take? I'm no wine expert, so don't ask me. But when you need to figure out how much data you need to collect in order to answer a question with some degree of reliability, you need to look at statistical power and sample size.  Power and sample size tools in statistical software like Minitab can tell you how much data you need to be confident in your results.

"Statistical power" refers to the probability that your hypothesis test will find a significant effect when one really exists. The amount of statistical power you need will depend on what you're trying to find out, and how much you're willing to invest to do so. If lives are on the line with the accuracy of your results (for instance, if you're testing airbags in a new vehicle), you'll probably want more statistical power than you would need to, say, assess the difference in softness between two types of toilet paper.

We recently wrote an overview of power and sample size, along with a summary of the tools available for it in Minitab Statistical Software includes power and sample size tools yu can use these tools to answer questions like:

  • How many samples do you need to determine if the average thickness of foam from one supplier differs from that from another?
  • How many individuals should you survey to be 95% confident that the proportion of people who prefer one soft drink over another is within 3% of its true value?
  • Can you trust a hypothesis test that indicates there is insufficient evidence to suggest the average safety records for two groups of drivers are different?
  • How many replicates do you need to run if you want your experiment to have at least a 75% chance of detecting the variables that significantly affect your outcome?

If you're not using power and sample size tools, how do you know you can trust the results of your analyses?