Blog posts and articles about regression analysis techniques applied to Lean and Six Sigma quality improvement projects.

Technology is very much part of
our lives nowadays. We use our smartphones to have video calls with
our friends and family, and watch our favourite TV shows on
tablets. Technology has also transformed the fitness industry with
the increasing popularity of fitness trackers.
Recently, I got myself a fitness watch and it's becoming my
favourite gadget. It can track how many steps I’ve taken, my... Continue Reading

Suppose you’ve collected data on cycle time, revenue, the
dimension of a manufactured part, or some other metric that’s
important to you, and you want to see what other variables may be
related to it. Now what?
When I graduated from college with my first statistics degree,
my diploma was bona fide proof that I'd endured hours and hours of
classroom lectures on various statistical topics, including
l... Continue Reading

For
one reason or another, the response variable in a regression
analysis might not satisfy one or more of
the assumptions of ordinary least squares regression. The
residuals might follow a skewed distribution or the
residuals might curve as the predictions increase. A common
solution when problems arise with the assumptions of ordinary least
squares regression is to transform the response... Continue Reading

I’ve written about R-squared before and I’ve concluded that it’s
not as intuitive as it seems at first glance. It can be a
misleading statistic because a high R-squared is not always good and a low
R-squared is not always bad. I’ve even said that R-squared is overrated and that the standard error of the estimate (S) can be
more useful.
Even though I haven’t always been enthusiastic about... Continue Reading

When running a binary logistic regression and many other
analyses in Minitab, we estimate parameters for a specified model
based on the sample data that has been collected. Most of the time,
we use what is called Maximum Likelihood Estimation. However, based
on specifics within your data, sometimes these estimation methods
fail. What happens then?
Specifically, during binary logistic regression, an... Continue Reading

What is an interaction? It’s when the effect of one factor
depends on the level of another factor. Interactions are important
when you’re performing ANOVA, DOE, or a regression analysis.
Without them, your model may be missing an important term that
helps explain variability in the response!
For example, let’s consider 3-point shooting in the NBA. We
previously saw that the number of 3-point... Continue Reading

In statistics, there are things you need to do so you can trust
your results. For example, you should check the sample size, the
assumptions of the analysis, and so on. In regression analysis, I
always urge people to check their residual plots.
In this blog post, I present one more thing you should do so you
can trust your regression results in certain
circumstances—standardize the continuous... Continue Reading

In the world of linear models, a hierarchical model contains all
lower-order terms that comprise the higher-order terms that also
appear in the model. For example, a model that includes the
interaction term A*B*C is hierarchical if it includes these terms:
A, B, C, A*B, A*C, and B*C.
Fitting the correct regression model can be as
much of an art as it is a science. Consequently, there's not always
a... Continue Reading

If you perform linear regression analysis, you might need to
compare different regression lines to see if their constants and
slope coefficients are different. Imagine there is an established
relationship between X and Y. Now, suppose you want to determine
whether that relationship has changed. Perhaps there is a new
context, process, or some other qualitative change, and you want to
determine... Continue Reading

Did
you ever wonder why statistical analyses and concepts often have
such weird, cryptic names?
One conspiracy theory points to the workings of a secret
committee called the ICSSNN. The International Committee for
Sadistic Statistical Nomenclature and Numerophobia was formed
solely to befuddle and subjugate the masses. Its mission: To select
the most awkward, obscure, and confusing name possible... Continue Reading

By Matthew Barsalou, guest
blogger
A problem must be understood before it can be properly
addressed. A thorough understanding of the problem is critical when
performing a
root cause analysis (RCA) and an RCA is necessary if an
organization wants to implement corrective actions that truly
address the root cause of the problem. An RCA may also be necessary
for process improvement projects; it is... Continue Reading

As Halloween
approaches, you are probably taking the necessary steps to protect
yourself from the various ghosts, goblins, and witches that are prowling
around. Monsters of all sorts are out to get you, unless they’re
sufficiently bribed with candy offerings!
I’m here to warn you about a ghoul that all statisticians and
data scientists need to be aware of: phantom degrees of freedom.
These phantoms... Continue Reading

With
Speaker John Boehner resigning, Kevin McCarthy quitting before the
vote for him to be Speaker, and a possible government shutdown in
the works, the Freedom Caucus has certainly been in the news
frequently! Depending on your political bent, the Freedom Caucus
has caused quite a disruption for either good or bad.
Who are these politicians? The Freedom Caucus is a group of
approximately 40... Continue Reading

I was recently asked a couple of
questions about stability studies in Minitab.
Question 1: If you enter in a lower and upper spec in
the Stability Study dialog window, why do I see only one confidence
bound per fitted line on the resulting graph? Shouldn’t there be
two?
You use a stability study to
analyze the stability of a product over time and to determine the
product's shelf life. In order to... Continue Reading

I recently guest lectured for an
applied regression analysis course at Penn State. Now, before you
begin making certain assumptions—because as any statistician will
tell you, assumptions are important in regression—you should know
that I have no teaching experience whatsoever, and I’m not much
older than the students I addressed.
I’m just 5 years removed from my undergraduate days at Virginia
Tech,... Continue Reading

My previous post showed an example of using
ordinary linear regression to model a count response. For that particular count data, shown by the blue
circles on the dot plot below, the model assumptions for linear
regression were adequately satisfied.
But frequently, count data may contain many values equal or
close to 0. Also, the distribution of the counts may be
right-skewed. In the quality field,... Continue Reading

Ever use dental floss to cut soft cheese? Or Alka Seltzer to
clean your toilet bowl? You can find a host of nonconventional uses for ordinary objects
online. Some are more peculiar than others.
Ever use ordinary linear regression to evaluate a response
(outcome) variable of counts?
Technically, ordinary linear regression was designed to evaluate
a a continuous response variable. A continuous... Continue Reading

In regression
analysis, overfitting a model is a real problem. An overfit model
can cause the regression coefficients, p-values, and R-squared to be misleading. In this post,
I explain what an overfit model is and how to detect and avoid this
problem.
An overfit model is one that is too complicated for your data
set. When this happens, the regression model becomes tailored to
fit the quirks and... Continue Reading

Imagine
a multi-million dollar company that released a product without
knowing the probability that it will fail after a certain amount of
time. “We offer a 2 year warranty, but we have no idea what
percentage of our products fail before 2 years.” Crazy, right?
Anybody who wanted to ensure the quality of their product would
perform a statistical analysis to look at the
reliability and survival of... Continue Reading

If
you want to use data to predict the impact of different variables,
whether it's for business or some personal interest, you need to
create a model based on the best information you have at your
disposal. In this post and subsequent posts throughout the football
season, I'm going to share how I've been developing and applying a
model for predicting the outcomes of 4th down decisions in Big... Continue Reading