Blog posts and articles about hypothesis testing, especially in the course of Lean Six Sigma quality improvement projects.

People can make mistakes when they test a hypothesis with
statistical analysis. Specifically, they can make either Type I or
Type II errors.
As you analyze your own data and test hypotheses, understanding
the difference between Type I and Type II errors is extremely
important, because there's a risk of making each type of error in
every analysis, and the amount of risk is in your
control.
So
if... Continue Reading

Welcome to the Hypothesis Test Casino! The featured game of the
house is roulette. But this is no ordinary game of
roulette. This is p-value roulette!
Here’s how it works: We have two roulette wheels, the Null wheel
and the Alternative wheel. Each wheel has 20 slots (instead of the
usual 37 or 38). You get to bet on one slot.
What happens if the ball lands in the slot you bet on? Well,
that depends... Continue Reading

Statistics can be challenging, especially if you're not
analyzing data and interpreting the results every day. Statistical
software makes things easier by handling the arduous
mathematical work involved in statistics. But ultimately, we're
responsible for correctly interpreting and communicating what the
results of our analyses show.
The p-value is probably the most frequently cited
statistic. We... Continue Reading

To make objective
decisions about the processes that are critical to your
organization, you often need to examine categorical data. You may
know how to use a t-test or ANOVA when you’re comparing measurement
data (like weight, length, revenue, and so on), but do you know how to compare
attribute or counts data? It easy to do with statistical software
like Minitab.
One person may look at
this bar... Continue Reading

In
Parts
1 and
2 of this blog series, I wrote about how statistical
inference uses data from a sample of individuals to reach
conclusions about the whole population. That’s a very powerful
tool, but you must check your assumptions when you make statistical
inferences. Violating any of these assumptions can result in false
positives or false negatives, thus invalidating your
results.
The common... Continue Reading

At the inaugural Minitab Insights Conference in September,
presenters Benjamin Turcan and Jennifer Berner discussed
how to present data effectively. Among the considerations they
discussed was choosing the right graph.
Different graphs are good for different things. Of course,
opinions about which graph is best can, and do, differ. Dotplot
devotees might decide that they are demonstrably... Continue Reading

In Part 1 of this
blog series, I wrote about how statistical inference uses data
from a sample of individuals to reach conclusions about the whole
population. That’s a very powerful tool, but you must check your
assumptions when you make statistical inferences. Violating any of
these assumptions can result in false positives or false negatives,
thus invalidating your results.
The common data... Continue Reading

If you’re not a statistician, looking through statistical output
can sometimes make you feel a bit like Alice in
Wonderland. Suddenly, you step into a fantastical world
where strange and mysterious phantasms appear out of nowhere.
For example, consider the T and P in your t-test results.
“Curiouser and curiouser!” you might exclaim, like Alice, as you
gaze at your output.
What are these values,... Continue Reading

Data mining can be helpful in the exploratory phase of an
analysis. If you're in the early stages and you're just figuring
out which predictors are potentially correlated with your response
variable, data mining can help you identify candidates. However,
there are problems associated with using data mining to select
variables.
In my previous post, we used data mining to settle on
the following... Continue Reading

I
watched an old motorcycle flick from the 1960s the other night, and I
was struck by the bikers' slang. They had a language all their own.
Just like statisticians, whose manner of speaking often confounds
those who aren't hep to the lingo of data analysis.
It got me thinking...what if there were an all-statistician
biker gang? Call them the Nulls Angels. Imagine them in their
colors, tearing... Continue Reading

True or false: When comparing a parameter for two sets of
measurements, you should always use a hypothesis test to determine
whether the difference is statistically significant.
The answer? (drumroll...) True!
...and False!
To understand this paradoxical answer, you need to keep in mind
the difference between samples, populations, and descriptive and
inferential statistics.
Descriptive Statistics and... Continue Reading

There may be huge potential benefits waiting in the data in your
servers. These data may be used for many different purposes. Better
data allows better decisions, of course. Banks, insurance firms,
and telecom companies already own a large amount of data about
their customers. These resources are useful for building a more
personal relationship with each customer.
Some organizations already use... Continue Reading

In 2011 we had solar panels fitted on our property. In the last
few months we have noticed a few problems with the inverter (the
equipment that converts the electricity generated by the panels
from DC to AC, and manages the transfer of unused electric to the
power company). It was shutting down at various times throughout
the day, typically when it was very sunny, resulting in no
electricity being... Continue Reading

So the data you nurtured, that you worked so hard to format and
make useful, failed the normality test.
Time to face the truth: despite your best efforts, that data set
is never going to measure up to the assumption you may
have been trained to fervently look for.
Your data's lack of normality seems to make it poorly suited for
analysis. Now what?
Take it easy. Don't get uptight. Just let your data... Continue Reading

Have you ever accidentally done statistics? Not all of us can
(or would want to) be “stat nerds,” but the word “statistics”
shouldn’t be scary. In fact, we all analyze things that happen to
us every day. Sometimes we don’t realize that we are compiling data
and analyzing it, but that’s exactly what we are doing. Yes, there
are advanced statistical concepts that can be difficult to
understand—but... Continue Reading

While some posts in our Minitab blog focus on
understanding t-tests and t-distributions this post will focus
more simply on how to hand-calculate the t-value for a one-sample
t-test (and how to replicate the p-value that Minitab gives
us).
The formulas used in this post are available within Minitab
Statistical Software by choosing the following menu path:
Help > Methods and Formulas
> Basic... Continue Reading

Analysis of variance (ANOVA) can determine whether the means of
three or more groups are different. ANOVA uses F-tests to
statistically test the equality of means. In this post, I’ll show
you how ANOVA and F-tests work using a one-way ANOVA example.
But wait a minute...have you ever stopped to wonder why you’d
use an analysis of variance to determine whether
means are different? I'll also show how... Continue Reading

Among the most underutilized statistical tools in Minitab, and I
think in general, are multivariate tools. Minitab offers a number
of different multivariate tools, including principal component
analysis, factor analysis,
clustering, and more. In this post, my goal is to give
you a better understanding of the multivariate tool called
discriminant analysis, and how it can be used.
Discriminant... Continue Reading

Once upon a time, when people wanted to compare the standard
deviations of two samples, they had two handy tests available, the
F-test and Levene's test.
Statistical lore has it that the F-test is so named because
it so frequently fails you.1
Although the F-test is suitable for data that are normally
distributed, its sensitivity to departures from
normality limits when and where it can be used.
Leve... Continue Reading

In statistics, t-tests are a type of hypothesis test that allows
you to compare means. They are called t-tests because each t-test
boils your sample data down to one number, the t-value. If you
understand how t-tests calculate t-values, you’re well on your way
to understanding how these tests work.
In this series of posts, I'm focusing on concepts rather than
equations to show how t-tests work.... Continue Reading