Blog posts and articles about how to use and interpret the P Value statistic in quality improvement efforts.

If you want to convince someone that at least a basic
understanding of statistics is an essential life skill, bring up
the case of Lucia de Berk. Hers is a story that's too awful to be
true—except that it is completely true.
A
flawed analysis irrevocably altered de Berk's life and kept her
behind bars for a full decade, and the fact that this analysis
targeted and harmed just one person makes it... Continue Reading

In the world of linear models, a hierarchical model contains all
lower-order terms that comprise the higher-order terms that also
appear in the model. For example, a model that includes the
interaction term A*B*C is hierarchical if it includes these terms:
A, B, C, A*B, A*C, and B*C.
Fitting the correct regression model can be as
much of an art as it is a science. Consequently, there's not always
a... Continue Reading

How deeply has statistical content from Minitab blog posts (or
other sources) seeped into your brain tissue? Rather than submit a
biopsy specimen from your temporal lobe for analysis, take this
short quiz to find out. Each question may have more than one
correct answer. Good luck!
Which
of the following are famous figure skating pairs, and which are
methods for testing whether your data follow a... Continue Reading

If you perform linear regression analysis, you might need to
compare different regression lines to see if their constants and
slope coefficients are different. Imagine there is an established
relationship between X and Y. Now, suppose you want to determine
whether that relationship has changed. Perhaps there is a new
context, process, or some other qualitative change, and you want to
determine... Continue Reading

I’ve
written a fair bit about P values: how to correctly interpret P values, a graphical representation of how they work,
guidelines for using P values, and why the
P value ban in one journal is a mistake. Along
the way, I’ve received many questions about P values, but the
questions from one reader stand out.
This reader asked, why is it so easy to interpret P
values incorrectly? Why is the common... Continue Reading

There are many reasons why a distribution might not be
normal/Gaussian. A non-normal pattern might be caused by several
distributions being mixed together, or by a drift in time, or by
one or several outliers, or by an asymmetrical behavior, some
out-of-control points, etc.
I recently collected the scores of three different teams (the
Blue team, the Yellow team and the Pink team) after a laser... Continue Reading

P-values are frequently misinterpreted, which causes many
problems. I won't rehash those
problems here here since my colleague Jim Frost has
detailed the issues involved at some length, but the fact remains
that the p-value will continue to be one of the most frequently
used tools for deciding if a result is statistically
significant.
You know the old saw about "Lies, damned lies, and... Continue Reading

Back when I was an undergrad in
statistics, I unfortunately spent an entire semester of my life
taking a class, diligently crunching numbers with my TI-82, before
realizing 1) that I was actually in an Analysis of Variance (ANOVA)
class, 2) why I would want to use such a tool in the first place,
and 3) that ANOVA doesn’t necessarily tell you a thing about
variances.
Fortunately, I've had a lot more... Continue Reading

I have two young children, and I
work full-time, so my adult TV time is about as rare as finding a
Kardashian-free tabloid. So I can’t commit to just any TV
show. It better be a good one. I was therefore extremely
excited when Netflix analyzed viewer
data to find out at what point
watchers get hooked on the first season of various
shows.
Specifically,
they identified the episode at which 70% of... Continue Reading

As Halloween
approaches, you are probably taking the necessary steps to protect
yourself from the various ghosts, goblins, and witches that are prowling
around. Monsters of all sorts are out to get you, unless they’re
sufficiently bribed with candy offerings!
I’m here to warn you about a ghoul that all statisticians and
data scientists need to be aware of: phantom degrees of freedom.
These phantoms... Continue Reading

In Part 5 of our series, we began the analysis of
the experiment data by reviewing analysis of covariance and
blocking variables, two key concepts in the design and
interpretation of your results.
The
250-yard marker at the Tussey Mountain Driving Range, one of the
locations where we conducted our golf experiment. Some of the
golfers drove their balls well beyond this 250-yard maker during a
few of... Continue Reading

By Matthew Barsalou, guest
blogger
Teaching process performance and capability studies is easier
when actual process data is available for the student or trainee to
practice with. As I have previously
discussed at the Minitab Blog, a catapult can be used to
generate data for a capability study. My last blog on using a
catapult for this purspose was several years ago, so I would like
to revisit... Continue Reading

In
Part 3 of our series, we decided to test our 4
experimental factors, Club Face Tilt, Ball Characteristics, Club
Shaft Flexibility, and Tee Height in a full factorial design
because of the many advantages of that data collection plan.
In Part 4 we concluded that each golfer
should replicate their half fraction of the full factorial 5 times
in order to have a high enough power to detect... Continue Reading

With
Speaker John Boehner resigning, Kevin McCarthy quitting before the
vote for him to be Speaker, and a possible government shutdown in
the works, the Freedom Caucus has certainly been in the news
frequently! Depending on your political bent, the Freedom Caucus
has caused quite a disruption for either good or bad.
Who are these politicians? The Freedom Caucus is a group of
approximately 40... Continue Reading

Step
3 in our DOE problem solving methodology is to determine how many
times to replicate the base experiment plan. The discussion in Part 3
ended with the conclusion that our
4 factors could best be studied using all 16 combinations of the
high and low settings for each factor, a full factorial. Each
golfer will perform half of the sixteen possible combinations and
each golfer’s data could stand as... Continue Reading

An exciting new study sheds light on the relationship between P
values and the replication of experimental results. This study
highlights issues that I've emphasized repeatedly—it is crucial to
interpret P values correctly, and significant
results must be replicated to be trustworthy.
The study also supports my disagreement with the decision
by the Journal of Basic and Applied Social Psychology to
b... Continue Reading

Repeated measures designs don’t fit our impression of a typical
experiment in several key ways. When we think of an experiment, we
often think of a design that has a clear distinction between the
treatment and control groups. Each subject is in one, and only one,
of these non-overlapping groups. Subjects who are in a treatment
group are exposed to only one type of treatment. This is the... Continue Reading

If
you use ordinary linear regression with a response of count data,
if may work out fine (Part
1), or you may run into some problems (Part
2).
Given that a count response could be problematic, why not use a
regression procedure developed to handle a response of counts?
A Poisson regression analysis is designed to analyze a
regression model with a count response.
First, let's try using Poisson... Continue Reading

My previous post showed an example of using
ordinary linear regression to model a count response. For that particular count data, shown by the blue
circles on the dot plot below, the model assumptions for linear
regression were adequately satisfied.
But frequently, count data may contain many values equal or
close to 0. Also, the distribution of the counts may be
right-skewed. In the quality field,... Continue Reading

Ever use dental floss to cut soft cheese? Or Alka Seltzer to
clean your toilet bowl? You can find a host of nonconventional uses for ordinary objects
online. Some are more peculiar than others.
Ever use ordinary linear regression to evaluate a response
(outcome) variable of counts?
Technically, ordinary linear regression was designed to evaluate
a a continuous response variable. A continuous... Continue Reading