Blog posts and articles about the role of the normal distribution in statistics, data analysis, and quality improvement.

So the data you nurtured, that you worked so hard to format and
make useful, failed the normality test.
Time to face the truth: despite your best efforts, that data set
is never going to measure up to the assumption you may
have been trained to fervently look for.
Your data's lack of normality seems to make it poorly suited for
analysis. Now what?
Take it easy. Don't get uptight. Just let your data... Continue Reading

See if this
sounds fair to you. I flip a coin.
Heads: You win
$1.Tails: You pay me $1.
You may not like games of chance, but you have to admit it seems
like a fair game. At least, assuming the coin is a normal, balanced
coin, and assuming I’m not a sleight-of-hand magician who can
control the coin.
How about this next
game?
You pay me $2 to play.I flip a coin over and over until
it comes up heads.Your... Continue Reading

Back
when I used to work in Minitab Tech Support, customers often asked
me, “What’s the difference between Cpk and Ppk?” It’s a good
question, especially since many practitioners default to using Cpk
while overlooking Ppk altogether. It’s like the '80s pop
duo Wham!, where Cpk is George Michael and Ppk is that other
guy.
Poofy hairdos styled with mousse, shoulder pads, and leg warmers
aside, let’s... Continue Reading

Here is a scenario involving process capability that we’ve seen
from time to time in Minitab's technical support department. I’m
sharing the details in this post so that you’ll know where to look
if you encounter a similar situation.
You need to run a capability analysis. You generate the output
using Minitab
Statistical Software. When you look at the results, the Cpk is
huge and the histogram in... Continue Reading

In my last post, we took the red pill and dove
deep into the unarguably fascinating and uncompromisingly
compelling world of the matrix plot. I've stuffed this post with
information about a topic of marginal interest...the marginal
plot.
Margins are important. Back in my English composition days, I
recall that margins were particularly prized for the inverse linear
relationship they maintained with... Continue Reading

Earlier this month, PLOS.org
published an article titled "Ten Simple Rules for Effective Statistical
Practice." The
10 rules are good reading for anyone who draws conclusions and makes decisions
based on data, whether
you're trying to extend the boundaries of scientific knowledge or
make good decisions for your business.
Carnegie Mellon University's
Robert E. Kass and several co-authors devised... Continue Reading

For
one reason or another, the response variable in a regression
analysis might not satisfy one or more of
the assumptions of ordinary least squares regression. The
residuals might follow a skewed distribution or the
residuals might curve as the predictions increase. A common
solution when problems arise with the assumptions of ordinary least
squares regression is to transform the response... Continue Reading

For hundreds of years, people having been improving their
situation by pulling themselves up by their bootstraps. Well, now
you can improve your statistical knowledge by pulling yourself up
by your bootstraps. Minitab
Express has 7 different bootstrapping analyses that can help
you better understand the sampling distribution of your
data.
A sampling distribution describes the likelihood of... Continue Reading

Once upon a time, when people wanted to compare the standard
deviations of two samples, they had two handy tests available, the
F-test and Levene's test.
Statistical lore has it that the F-test is so named because
it so frequently fails you.1
Although the F-test is suitable for data that are normally
distributed, its sensitivity to departures from
normality limits when and where it can be used.
Leve... Continue Reading

In the
first part of this series, we looked at a case study where
staff at a hospital used ATP swab tests to test 8 surfaces for
bacteria in 10 different hospital rooms across 5 departments. ATP
measurements below 400 units pass the swab test, while measurements
greater than or equal to 400 units fail the swab test and require
further investigation.
I
offered two tips on exploring and visualizing... Continue Reading

Working with healthcare-related data often feels different than
working with manufacturing data. After all, the common thread among
healthcare quality improvement professionals is the motivation to
preserve and improve the lives of patients. Whether collecting data
on the number of patient falls, patient length-of-stay, bed
unavailability, wait times, hospital acquired-infections, or
readmissions,... Continue Reading

T-tests are handy hypothesis tests in statistics when you want to
compare means. You can compare a sample mean to a hypothesized or
target value using a one-sample t-test. You can compare the means
of two groups with a two-sample t-test. If you have two groups with
paired observations (e.g., before and after measurements), use the
paired t-test.
How do t-tests work? How do t-values fit in? In this... Continue Reading

About
a year ago, a reader asked if I could try to explain
degrees of freedom in statistics. Since then,
I’ve been circling around that request very cautiously, like it’s
some kind of wild beast that I’m not sure I can safely wrestle to
the ground.
Degrees of freedom aren’t easy to explain. They come up in many
different contexts in statistics—some advanced and complicated. In
mathematics, they're... Continue Reading

Five-point
Likert scales are commonly associated with surveys and are used in
a wide variety of settings. You’ve run into the Likert scale if
you’ve ever been asked whether you strongly agree, agree, neither
agree or disagree, disagree, or strongly disagree about something.
The worksheet to the right shows what five-point Likert data look
like when you have two groups.
Because Likert item data are... Continue Reading

In my last post, I discussed how a DOE was
chosen to optimize a chemical-mechanical polishing process in
the microelectronics industry. This important process improved the
plant's final manufacturing yields. We selected an experimental
design that let us study the effects of six process parameters in
16 runs.
Analyzing the Design
Now we'll examine the analysis of the DOE results after the
actual... Continue Reading

Like so many of us, I try to stay healthy by watching my weight.
I thought it might be interesting to apply some statistical
thinking to the idea of maintaining a healthy weight, and the
central limit theorem could provide some particularly useful
insights. I’ll start by making some simple (maybe even simplistic)
assumptions about calorie intake and expenditure, and see where
those lead. And then... Continue Reading

There's nothing like a boxplot, aka box-and-whisker diagram, to
get a quick snapshot of the distribution of your data. With a
single glance, you can readily intuit its general shape, central
tendency, and variability.
To
easily compare the distribution of data between groups, display
boxplots for the groups side by side. Visually compare the central
value and spread of the distribution for each... Continue Reading

How deeply has statistical content from Minitab blog posts (or
other sources) seeped into your brain tissue? Rather than submit a
biopsy specimen from your temporal lobe for analysis, take this
short quiz to find out. Each question may have more than one
correct answer. Good luck!
Which
of the following are famous figure skating pairs, and which are
methods for testing whether your data follow a... Continue Reading

When you work in data analysis, you quickly discover an
irrefutable fact: a lot of people just can't stand
statistics. Some people fear the math, some fear what the data
might reveal, some people find it deadly dull, and others think
it's bunk. Many don't even really know why they hate
statistics—they just do. Always have, probably always
will.
Problem is, that means we who analyze data need to
com... Continue Reading

There are many reasons why a distribution might not be
normal/Gaussian. A non-normal pattern might be caused by several
distributions being mixed together, or by a drift in time, or by
one or several outliers, or by an asymmetrical behavior, some
out-of-control points, etc.
I recently collected the scores of three different teams (the
Blue team, the Yellow team and the Pink team) after a laser... Continue Reading