Blog posts and articles about regression analysis methods applied to Lean and Six Sigma projects.

In the world of linear models, a hierarchical model contains all
lower-order terms that comprise the higher-order terms that also
appear in the model. For example, a model that includes the
interaction term A*B*C is hierarchical if it includes these terms:
A, B, C, A*B, A*C, and B*C.
Fitting the correct regression model can be as
much of an art as it is a science. Consequently, there's not always
a... Continue Reading

How deeply has statistical content from Minitab blog posts (or
other sources) seeped into your brain tissue? Rather than submit a
biopsy specimen from your temporal lobe for analysis, take this
short quiz to find out. Each question may have more than one
correct answer. Good luck!
Which
of the following are famous figure skating pairs, and which are
methods for testing whether your data follow a... Continue Reading

If you perform linear regression analysis, you might need to
compare different regression lines to see if their constants and
slope coefficients are different. Imagine there is an established
relationship between X and Y. Now, suppose you want to determine
whether that relationship has changed. Perhaps there is a new
context, process, or some other qualitative change, and you want to
determine... Continue Reading

When you work in data analysis, you quickly discover an
irrefutable fact: a lot of people just can't stand
statistics. Some people fear the math, some fear what the data
might reveal, some people find it deadly dull, and others think
it's bunk. Many don't even really know why they hate
statistics—they just do. Always have, probably always
will.
Problem is, that means we who analyze data need to
com... Continue Reading

The
College Football Playoff technically doesn't start until December
31st, but in reality it started Saturday night in Indianapolis. The
winner of the Big Ten Championship Game was in the playoff, while
the loser was out. The stakes couldn't have been higher. So the
competitors need to make sure they gain every advantage they can.
And that's where 4th down decisions come in. With a lot of... Continue Reading

This week is the annual Thanksgiving holiday in the United
States, a period where we are encouraged to eat turkey and
cranberries, then consider the blessings in our lives before
falling into a comfortable pre-football nap. That includes many of
us here at Minitab.
Consequently,
we won't have new posts for you over the next two days. But
one of the things I'm grateful for is having had the... Continue Reading

Did
you ever wonder why statistical analyses and concepts often have
such weird, cryptic names?
One conspiracy theory points to the workings of a secret
committee called the ICSSNN. The International Committee for
Sadistic Statistical Nomenclature and Numerophobia was formed
solely to befuddle and subjugate the masses. Its mission: To select
the most awkward, obscure, and confusing name possible... Continue Reading

By Matthew Barsalou, guest
blogger
A problem must be understood before it can be properly
addressed. A thorough understanding of the problem is critical when
performing a
root cause analysis (RCA) and an RCA is necessary if an
organization wants to implement corrective actions that truly
address the root cause of the problem. An RCA may also be necessary
for process improvement projects; it is... Continue Reading

In Part 5 of our series, we began the analysis of
the experiment data by reviewing analysis of covariance and
blocking variables, two key concepts in the design and
interpretation of your results.
The
250-yard marker at the Tussey Mountain Driving Range, one of the
locations where we conducted our golf experiment. Some of the
golfers drove their balls well beyond this 250-yard maker during a
few of... Continue Reading

In
Part 3 of our series, we decided to test our 4
experimental factors, Club Face Tilt, Ball Characteristics, Club
Shaft Flexibility, and Tee Height in a full factorial design
because of the many advantages of that data collection plan.
In Part 4 we concluded that each golfer
should replicate their half fraction of the full factorial 5 times
in order to have a high enough power to detect... Continue Reading

With
Speaker John Boehner resigning, Kevin McCarthy quitting before the
vote for him to be Speaker, and a possible government shutdown in
the works, the Freedom Caucus has certainly been in the news
frequently! Depending on your political bent, the Freedom Caucus
has caused quite a disruption for either good or bad.
Who are these politicians? The Freedom Caucus is a group of
approximately 40... Continue Reading

Step
3 in our DOE problem solving methodology is to determine how many
times to replicate the base experiment plan. The discussion in Part 3
ended with the conclusion that our
4 factors could best be studied using all 16 combinations of the
high and low settings for each factor, a full factorial. Each
golfer will perform half of the sixteen possible combinations and
each golfer’s data could stand as... Continue Reading

Step
2 in our DOE problem-solving methodology is to design the data
collection plan you will use to study the factors in your
experiment. Of course, you will have to incorporate blocking and
covariates in your experiment design, as well as calculate the
number of replications of run conditions needed in order to be
confident in your results.
We will address these topics in future posts, but for... Continue Reading

I recently guest lectured for an
applied regression analysis course at Penn State. Now, before you
begin making certain assumptions—because as any statistician will
tell you, assumptions are important in regression—you should know
that I have no teaching experience whatsoever, and I’m not much
older than the students I addressed.
I’m just 5 years removed from my undergraduate days at Virginia
Tech,... Continue Reading

If
you use ordinary linear regression with a response of count data,
if may work out fine (Part
1), or you may run into some problems (Part
2).
Given that a count response could be problematic, why not use a
regression procedure developed to handle a response of counts?
A Poisson regression analysis is designed to analyze a
regression model with a count response.
First, let's try using Poisson... Continue Reading

My previous post showed an example of using
ordinary linear regression to model a count response. For that particular count data, shown by the blue
circles on the dot plot below, the model assumptions for linear
regression were adequately satisfied.
But frequently, count data may contain many values equal or
close to 0. Also, the distribution of the counts may be
right-skewed. In the quality field,... Continue Reading

Ever use dental floss to cut soft cheese? Or Alka Seltzer to
clean your toilet bowl? You can find a host of nonconventional uses for ordinary objects
online. Some are more peculiar than others.
Ever use ordinary linear regression to evaluate a response
(outcome) variable of counts?
Technically, ordinary linear regression was designed to evaluate
a a continuous response variable. A continuous... Continue Reading

In 2007, the Crayola crayon company encountered a problem.
Labels were coming off of their crayons. Up to that point, Crayola
had done little to implement data-driven methodology into the
process of manufacturing their crayons. But that was about to
change. An elementary data analysis showed that the adhesive didn’t
consistently set properly when the labels were dry. Misting crayons
as they went... Continue Reading

In regression
analysis, overfitting a model is a real problem. An overfit model
can cause the regression coefficients, p-values, and R-squared to be misleading. In this post,
I explain what an overfit model is and how to detect and avoid this
problem.
An overfit model is one that is too complicated for your data
set. When this happens, the regression model becomes tailored to
fit the quirks and... Continue Reading

Imagine
a multi-million dollar company that released a product without
knowing the probability that it will fail after a certain amount of
time. “We offer a 2 year warranty, but we have no idea what
percentage of our products fail before 2 years.” Crazy, right?
Anybody who wanted to ensure the quality of their product would
perform a statistical analysis to look at the
reliability and survival of... Continue Reading