Forget Statistical Assumptions - Just Check the Requirements!

One of the most poorly understood concepts in the use of statistics is the idea of assumptions. You've probably encountered many of these assumptions, such as "data normality is an assumption of the 1-sample t-test." But if you read that statement and believe normality is a requirement of the 1-sample t-test, then you have missed a subtle and important characteristic of assumptions and need to read on...

An "assumption" is not necessarily a "requirement"!

To understand where this idea of assumptions come from, let's forget about statistics for a minute and imagine we sell bikes online. We can't ship our bikes whole, so we ship each bike separated into the frame, handlebar, seat, and wheels and must provide assembly instructions. We of course want simple and effective instructions.

Now, we don't necessarily know which tools the recipient owns, and we also don't know if, perhaps, they already own wheels or their own seat and want to use those instead of what we shipped. We also don't know if there will be one person assembling the bike alone, or two, or even more. And maybe it's a kid's bike and the kid wants to assemble it, or maybe an adult does. The recipient may have a bike stand, or they may just be assembling on the floor. Also, some people may have bought training wheels if the bike is for their child. If so, the training wheels may need to be installed before the real wheels.

You can see how complicated our instructions will become if we try to accommodate all of the possible scenarios. Worse, the instructions may end up being LESS useful rather than more if they are too complicated to understand or too general to provide the best methods for installation.

Reasonable Assumptions

So to make sure our instructions are easy to use and effective, we decide to make some assumptions about the person using them. We assume they own a Phillips head screwdriver and an adjustable wrench, and that they do not own a bike stand. We assume they are only assembling the parts we send them and don't have their own. We assume there will be one adult assembling the bike alone. All of these are reasonable assumptions that should capture the most common scenarios, and by making these assumptions about the user we can make very simple and useful instructions. For those meeting these assumptions, assembly is easy!

So to this point we've seen a process by which making some assumptions has greatly reduced the complexity (and increased the effectiveness) of a product.

This is entirely consistent with statistical assumptions. When a tool such as One-Way ANOVA was developed, it started with a basic problem: "How can one use data to tell if three or more groups have different means?" and by making some reasonable assumptions (data within each group are normal, groups have equal variances, etc.) a fairly simple test was formed. Had those assumptions not been made, the test would be more complicated and likely less effective.

But let's go back to what I said above: an "assumption" is not necessarily a "requirement"!

When Is An Assumption Also a Requirement?

Considering our bike assembly instructions again, let's further examine a couple of the assumptions we made. We assumed the bike would be assembled by one person alone and wrote the instructions accordingly (for example, by not including directions such as "Have someone else hold the wheel upright while you..."). What if the assumption was not true for one of our customers and there were two people ready to help one another assemble the bike? In all likelihood, our instructions are still just as simple and effective and assembly may even be faster. In this case, the assumption is not a requirement at all--but making the assumption allowed us to make the best set of instructions possible! Even if the assembler doesn't meet this assumption, the instructions are still robust.

We also assumed the customer owned a Phillips head screwdriver. But what if a customer only owns a flathead screwdriver? In that case, they likely cannot proceed with our instructions. This assumption is also a requirement.

We could similarly examine all of our assumptions after the fact to consider:

Is the assumption a requirement?
If it is not a requirement, is it robust to any other possible scenario?
If it is not robust to any other scenario, under which scenarios is it robust?

Answering Questions about Statistical Assumptions

When we look at statistics, we must understand the same aspects of each assumption. For example, normality is an assumption of the 1-sample t-test. But let's answer the three questions about this assumption:

Is the assumption a requirement?
No, the assumption is not a requirement (this has been demonstrated through multiple studies and simulations).
If it is not a requirement, is it robust to any other possible scenario?
No. It is robust, but not to every possible scenario.
If it is not robust to any other scenario, under which scenarios is it robust?
It is robust when the sample size is at least 20 for small-to-moderate deviations from normality and at least 40 for more extremely skewed distributions.

So the next time you're presenting results and someone asks if you checked all of your assumptions, feel free to say, "No, just the requirements!"