Stories and real-world examples that show you how to apply statistics and statistical software to solve problems.

Histograms are one of the
most common graphs used to display numeric data. Anyone who
takes a statistics course is likely to learn about the histogram,
and for good reason: histograms are easy to understand and can
instantly tell you a lot about your data.
Here are three of the most important things you can learn by
looking at a histogram.
Shape—Mirror, Mirror, On the Wall…
If the left side of a... Continue Reading

Did
you ever wonder why statistical analyses and concepts often have
such weird, cryptic names?
One conspiracy theory points to the workings of a secret
committee called the ICSSNN. The International Committee for
Sadistic Statistical Nomenclature and Numerophobia was formed
solely to befuddle and subjugate the masses. Its mission: To select
the most awkward, obscure, and confusing name possible... Continue Reading

T'was the season for toys recently, and Christmas day found me
playing around with a classic, the Etch-a-Sketch. As I noodled with
the knobs, I had a sudden flash of recognition: my drawing reminded
me of the Empirical CDF Plot in Minitab Statistical Software. Did you just ask,
"What's a CDF plot? And what's so empirical about it?" Both very
good questions. Let's start with the first, and we'll... Continue Reading

Data mining can be helpful in the exploratory phase of an
analysis. If you're in the early stages and you're just figuring
out which predictors are potentially correlated with your response
variable, data mining can help you identify candidates. However,
there are problems associated with using data mining to select
variables.
In my previous post, we used data mining to settle on
the following... Continue Reading

True or false: When comparing a parameter for two sets of
measurements, you should always use a hypothesis test to determine
whether the difference is statistically significant.
The answer? (drumroll...) True!
...and False!
To understand this paradoxical answer, you need to keep in mind
the difference between samples, populations, and descriptive and
inferential statistics.
Descriptive Statistics and... Continue Reading

Today,
September 16, is World Ozone Day. You don't hear much about the
ozone layer any more.
In fact, if you’re under 30, you might think this is just
another trivial, obscure observance, along the lines of International Dot Day (yesterday) or National Apple Dumpling Day (tomorrow).
But there’s a good reason that, almost 30 years ago, the United
Nations designated today to as a day to raise... Continue Reading

You’ve
performed multiple linear regression and have settled on a model
which contains several predictor variables that are statistically
significant. At this point, it’s common to ask, “Which variable is
most important?”
This question is more complicated than it first appears. For one
thing, how you define “most important” often depends on your
subject area and goals. For another, how you collect... Continue Reading

In regression, "sums of squares" are used to represent
variation. In this post, we’ll use some sample data to walk through
these calculations.
The
sample data used in this post is available within Minitab by
choosing Help > Sample Data,
or File > Open Worksheet >
Look in Minitab Sample Data folder (depending on
your version of Minitab). The dataset is called
ResearcherSalary.MTW, and contains data... Continue Reading

I blogged a few months back about three different Minitab tools
you can use to examine your data over time. Did you know you
that you can also use a simple run chart to display how your
process data changes over time? Of course those “changes” could be
evidence of special-cause variation, which a run chart can help you
see.
What’s special-cause variation, and how’s it different from
common-cause... Continue Reading

While some posts in our Minitab blog focus on
understanding t-tests and t-distributions this post will focus
more simply on how to hand-calculate the t-value for a one-sample
t-test (and how to replicate the p-value that Minitab gives
us).
The formulas used in this post are available within Minitab
Statistical Software by choosing the following menu path:
Help > Methods and Formulas
> Basic... Continue Reading

An
outlier is an observation in a data set that lies a substantial
distance from other observations. These unusual observations can
have a disproportionate effect on statistical analysis,
such as the mean, which can lead to misleading results.
Outliers can provide useful information about your data or process,
so it's important to investigate them. Of course, you have to find
them first.
Finding... Continue Reading

Businesses are getting more and more data from existing and
potential customers: whenever we click on a web site, for example,
it can be recorded in the vendor's database. And whenever we use
electronic ID cards to access public transportation or other
services, our movements across the city may be analyzed.
In the very near future, connected objects such as cars and
electrical appliances will... Continue Reading

For hundreds of years, people having been improving their
situation by pulling themselves up by their bootstraps. Well, now
you can improve your statistical knowledge by pulling yourself up
by your bootstraps. Minitab
Express has 7 different bootstrapping analyses that can help
you better understand the sampling distribution of your
data.
A sampling distribution describes the likelihood of... Continue Reading

Analysis of variance (ANOVA) can determine whether the means of
three or more groups are different. ANOVA uses F-tests to
statistically test the equality of means. In this post, I’ll show
you how ANOVA and F-tests work using a one-way ANOVA example.
But wait a minute...have you ever stopped to wonder why you’d
use an analysis of variance to determine whether
means are different? I'll also show how... Continue Reading

In statistics, t-tests are a type of hypothesis test that allows
you to compare means. They are called t-tests because each t-test
boils your sample data down to one number, the t-value. If you
understand how t-tests calculate t-values, you’re well on your way
to understanding how these tests work.
In this series of posts, I'm focusing on concepts rather than
equations to show how t-tests work.... Continue Reading

T-tests are handy hypothesis tests in statistics when you want to
compare means. You can compare a sample mean to a hypothesized or
target value using a one-sample t-test. You can compare the means
of two groups with a two-sample t-test. If you have two groups with
paired observations (e.g., before and after measurements), use the
paired t-test.
How do t-tests work? How do t-values fit in? In this... Continue Reading

About
a year ago, a reader asked if I could try to explain
degrees of freedom in statistics. Since then,
I’ve been circling around that request very cautiously, like it’s
some kind of wild beast that I’m not sure I can safely wrestle to
the ground.
Degrees of freedom aren’t easy to explain. They come up in many
different contexts in statistics—some advanced and complicated. In
mathematics, they're... Continue Reading

Allow me to make a confession up front: I won't hesitate to beat
my kids at a game.
My
kids are young enough that in pretty much any game that is
predominantly determined by skill and not luck, I can beat them—and
beat them easily. This isn't some macho thing where it makes me
feel good, and I suppose is only partially based in wanting them to
handle both winning and losing well. It's just how I... Continue Reading

P values have been around for nearly a century and they’ve been
the subject of criticism since their origins. In recent years, the
debate over P values has risen to a fever pitch. In particular,
there are serious fears that P values are misused to such an extent
that it has actually damaged science.
In March 2016, spurred on by the growing concerns, the American
Statistical Association (ASA) did... Continue Reading

There's nothing like a boxplot, aka box-and-whisker diagram, to
get a quick snapshot of the distribution of your data. With a
single glance, you can readily intuit its general shape, central
tendency, and variability.
To
easily compare the distribution of data between groups, display
boxplots for the groups side by side. Visually compare the central
value and spread of the distribution for each... Continue Reading