Power and Sample Size – Your Insurance Policy for Statistical Analysis

Do you have an insurance policy that will pay out if your car gets damaged? Do you pay the premium because you know your car will be damaged? No, you pay it so that if you do damage your car you will get a payment to cover the damage.

When we do statistical analyses, like hypothesis testing and design of experiments, we are using a sample of data to answer questions about all of our data. The reliability of these answers is affected by the size of the sample we analyze.To minimize the risk of doing unreliable statistical analysis we can use the Power and Sample size before collecting any data to determine how much data is needed to have a good chance of finding that effect, if it exists. The minimum recommended value for this is 80%.

For this reason, I like to think of the Power and Sample size calculations as an insurance policy for my statistical analysis. Let me explain why, by comparing it to buying and making claims on car insurance. If you would like to know more about the use and interpretation of Power Analysis, an introduction to using Minitab for hypothesis testing is covered in the Minitab Essentials training course and it is the focus all day in the Design of Experiments course — Factorial Designs (learn more and see the public training schedule).

1. Determining the Type of Analysis You Need

When choosing a new car policy, you need to determine which type you want before asking for a quote. This will determine the calculation method for the insurance. It is the same for Power Analysis. You need to know which type of analysis you want to do. For example, are you doing a 1-sample t-test, ANOVA or Factorial Designs? Knowing the type of analysis you need is the first step toward selecting the best calculation method.

2. What is Acceptable Risk? – Setting the Significance Level

When getting an insurance quote, you are asked how much excess, or deductible, you are prepared to pay. The larger your deductible, the lower your premium will be. The downside of a lower premium a higher out-of-pocket cost if you do have an accident. This could have a serious impact if you cannot not afford to pay the additional amount.

In the Power and Sample size calculations, you are asked to set a significance level (the default in Minitab is 0.05). The larger you make this number, the smaller the sample size you will need. The negative impact of this is that you have a higher chance of making the wrong decision. In this case, the probability of rejecting the null hypothesis (Type I Error), finding that there is an effect when there isn’t, is increased.

For the Power and Sample size to provide valuable insight, you need to understand the costs (which may not only be in monetary terms) before selecting the significance level.

3. Estimating the Risks – Using Difference and Standard Deviation

When you apply for car insurance, the insurers ask you various questions about your demographics, your car, your driving experience and your driving habits. Insurers use this information to estimate the risks of you having an accident or your car being stolen, which is then used to determine your premium.

In a Power Analysis calculations, you will also need to provide information up front to properly estimate risk. First, you need to provide information on how big an effect is of practical importance. For example, if you are selling a chocolate bar, how big of difference from the advertised weight would result in either customer complaints, or unacceptable increases in production cost (in the Minitab Power Analysis dialog boxes this is called Difference). The second input, is the standard deviation of you process which is used to estimate variability.

Together, these pieces of information help calculate risk.

4. Outcome is Unknown During Planning

When you take out an insurance policy for your car, you know there is a risk of filing a claim and you want to be sure you have enough coverage if the worst happens. However, you are not certain that you will need to file a claim. Something similar happens during data analysis. If you are planning a data analysis, you think there is a question that could be answered with the analysis, you want to ensure that you collect enough data to answer the question, but until you have completed the analysis, you will not be certain what the conclusions will be.

5. Premium Size – Sample Size

Once you have given all the information to the insurance company, they will come back with a premium to provide you coverage. If the premium is too much for your budget, you can adjust some factors to lower it.

Power Analysis returns a sample size, and just like insurance premiums, if you cannot collect that much data, you can change some of the inputted factors. The Power Analysis also allows you to input the details of the sample size you have used already (or can afford to use) and estimate the Power. If the Power is less than 80%, the sample size you have chosen is not big enough to have a good chance of identifying a problem or opportunity in your process.

If you would like to know more about the use and interpretation of Power Analysis, why not attend Minitab Training? An introduction to its use in hypothesis testing is covered in the Minitab Essentials course, and if you are running Design of Experiments this is covered in our introductory course on Design of Experiments, Factorial Designs.