Data mining can be helpful in the exploratory phase of an
analysis. If you're in the early stages and you're just figuring
out which predictors are potentially correlated with your response
variable, data mining can help you identify candidates. However,
there are problems associated with using data mining to select
In my previous post, we used data mining to settle on
the following... Continue Reading
watched an old motorcycle flick from the 1960s the other night, and I
was struck by the bikers' slang. They had a language all their own.
Just like statisticians, whose manner of speaking often confounds
those who aren't hep to the lingo of data analysis.
It got me thinking...what if there were an all-statistician
biker gang? Call them the Nulls Angels. Imagine them in their
colors, tearing... Continue Reading
If you were among the 300 people who attended the first-ever
Minitab Insights conference in September, you already know how
powerful it was. Attendees learned how practitioners from a
wide range of industries use data analysis to address a variety of
problems, find solutions, and improve business practices.
In the coming weeks and months, we will share more of the great
insights and guidance shared... Continue Reading
True or false: When comparing a parameter for two sets of
measurements, you should always use a hypothesis test to determine
whether the difference is statistically significant.
The answer? (drumroll...) True!
To understand this paradoxical answer, you need to keep in mind
the difference between samples, populations, and descriptive and
Descriptive Statistics and... Continue Reading
mining uses algorithms to explore correlations in data sets. An
automated procedure sorts through large numbers of variables and
includes them in the model based on statistical significance alone.
No thought is given to whether the variables and the signs and
magnitudes of their coefficients make theoretical sense.
We tend to think of data mining in the context of big data, with
its huge... Continue Reading
September 16, is World Ozone Day. You don't hear much about the
ozone layer any more.
In fact, if you’re under 30, you might think this is just
another trivial, obscure observance, along the lines of International Dot Day (yesterday) or National Apple Dumpling Day (tomorrow).
But there’s a good reason that, almost 30 years ago, the United
Nations designated today to as a day to raise... Continue Reading
performed multiple linear regression and have settled on a model
which contains several predictor variables that are statistically
significant. At this point, it’s common to ask, “Which variable is
This question is more complicated than it first appears. For one
thing, how you define “most important” often depends on your
subject area and goals. For another, how you collect... Continue Reading
There may be huge potential benefits waiting in the data in your
servers. These data may be used for many different purposes. Better
data allows better decisions, of course. Banks, insurance firms,
and telecom companies already own a large amount of data about
their customers. These resources are useful for building a more
personal relationship with each customer.
Some organizations already use... Continue Reading
The college football season is here, and this raises a very
Is Alabama going to be undefeated when they win the national
championship, or will they lose a regular-season game along the
Okay, so it's not a given that Alabama is going to win
the championship this year, but when you've won 4 of the last 7
you're definitely the odds-on favorite.
However, what if we wanted to take... Continue Reading
In 2011 we had solar panels fitted on our property. In the last
few months we have noticed a few problems with the inverter (the
equipment that converts the electricity generated by the panels
from DC to AC, and manages the transfer of unused electric to the
power company). It was shutting down at various times throughout
the day, typically when it was very sunny, resulting in no
electricity being... Continue Reading
See if this
sounds fair to you. I flip a coin.
Heads: You win
$1.Tails: You pay me $1.
You may not like games of chance, but you have to admit it seems
like a fair game. At least, assuming the coin is a normal, balanced
coin, and assuming I’m not a sleight-of-hand magician who can
control the coin.
How about this next
You pay me $2 to play.I flip a coin over and over until
it comes up heads.Your... Continue Reading
I blogged a few months back about three different Minitab tools
you can use to examine your data over time. Did you know you
that you can also use a simple run chart to display how your
process data changes over time? Of course those “changes” could be
evidence of special-cause variation, which a run chart can help you
What’s special-cause variation, and how’s it different from
common-cause... Continue Reading
Design of Experiments (DOE) is the perfect tool to efficiently
determine if key inputs are related to key outputs. Behind the
scenes, DOE is simply a regression analysis. What’s not simple,
however, is all of the choices you have to make when planning your
experiment. What X’s should you test? What ranges should you select
for your X’s? How many replicates should you use? Do you need
center... Continue Reading
In the great 1971 movie Willy Wonka and the Chocolate
Factory, the reclusive owner of the Wonka Chocolate Factory
decides to place golden tickets in five of his famous chocolate
bars, and allow the winners of each to visit his factory with a
guest. Since restarting production after three years of silence, no
one has come in or gone out of the factory. Needless to say, there
is enormous interest in... Continue Reading
In my last post, we took the red pill and dove
deep into the unarguably fascinating and uncompromisingly
compelling world of the matrix plot. I've stuffed this post with
information about a topic of marginal interest...the marginal
Margins are important. Back in my English composition days, I
recall that margins were particularly prized for the inverse linear
relationship they maintained with... Continue Reading
Time series data is proving to be very useful these days in a
number of different industries. However, fitting a specific model
is not always a straightforward process. It requires a good look at
the series in question, and possibly trying several different
models before identifying the best one. So how do we get there? In
this post, I'll take a look at how we can examine our data and get
a feel... Continue Reading
may not be a situation more perilous than being a character on
Game of Thrones. Warden of the North, Hand of
the King, and apparent protagonist of the entire series? Off with
your head before the end of the first season! Last male heir of a
royal bloodline? Here, have a pot of molten gold poured on your
head! Invited to a wedding? Well, you probably know what happens at
weddings in the show. ... Continue Reading
In part 2 of this series, we used graphs and tables to see
how individual factors affected rates of patient participation
in a cardiac rehabilitation program. This initial look at the data
indicated that ease of access to the hospital was a very important
contributor to patient participation.
this revelation, a bus or shuttle service for people who do not
have cars might be a good way to... Continue Reading
Minitab is the leading provider of software and services for quality
improvement and statistics education. More than 90% of Fortune 100 companies
use Minitab Statistical Software, our flagship product, and more students
worldwide have used Minitab to learn statistics than any other package.
Minitab Inc. is a privately owned company headquartered in State College,
Pennsylvania, with subsidiaries in the United Kingdom, France, and
Australia. Our global network of representatives serves more than 40
countries around the world.