Blog posts and articles about regression analysis methods applied to Lean and Six Sigma projects.

"Data! Data! Data! I can't make bricks without clay."
— Sherlock Holmes, in Arthur Conan Doyle's The Adventure
of the Copper Beeches
Whether you're the world's greatest detective trying to crack a
case or a person trying to solve a problem at work, you're going to
need information. Facts. Data, as Sherlock Holmes
says.
But not all data is created equal, especially if you plan to
analyze as part of... Continue Reading

Stepwise regression and best subsets regression are both
automatic tools that help you identify useful predictors during the
exploratory stages of model building for linear regression. These
two procedures use different methods and present you with different
output.
An obvious question arises. Does one procedure pick the true
model more often than the other? I’ll tackle that question in this
post.
Fi... Continue Reading

Minitab 17 gives you the confidence you need to improve quality.

Download the Free Trial
Using a sample to estimate the properties of an entire population
is common practice in statistics. For example, the mean from a
random sample estimates that parameter for an entire population. In linear
regression analysis, we’re used to the idea that the regression coefficients are estimates of the
true parameters. However, it’s easy to forget that R-squared
(R2) is also an estimate.... Continue Reading

You need to consider many factors when you’re buying a used car.
Once you narrow your choice down to a particular car model, you can
get a wealth of information about individual cars on the market
through the Internet. How do you navigate through it all to find
the best deal? By analyzing the data you have available.
Let's look at how this works using
the Assistant in Minitab 17. With the... Continue Reading

I’ve written about the importance of checking your residual plots when performing
linear regression analysis. If you don’t satisfy the assumptions
for an analysis, you might not be able to trust the results. One of
the assumptions for regression analysis is that the residuals are
normally distributed. Typically, you assess this assumption using
the normal probability plot of the residuals.
Are... Continue Reading

In my
previous post, I described how I was asked to weigh in on the
ethics of researchers (DeStefano et al. 2004) who reportedly
discarded data and potentially set scientific knowledge back a
decade. I assessed the study in question and found that no data was
discarded and that the researchers used good statistical
practices.
In
this post, I assess a study by Brian S. Hooker that was... Continue Reading

The other day I received a request from a friend to look into a
new study in a peer reviewed journal that found a link between MMR
vaccinations and an increased risk of autism in African Americans
boys. To draw this conclusion, the new study reanalyzed data that
was discarded a decade ago by a previous study.
My
friend wanted to know, from a statistical perspective, was it
unethical for the... Continue Reading

Previously, I showed why there is no R-squared for nonlinear regression. Anyone
who uses nonlinear regression will also notice that there are no P
values for the predictor variables. What’s going on?
Just like there are good reasons not to calculate R-squared for
nonlinear regression, there are also good reasons not to calculate
P values for the coefficients.
Why not—and what to use instead—are the... Continue Reading

I caught the end of Toy Story over the weekend, which is
definitely one of my all-time favorite children’s movies.
Now—unfortunately or fortunately—I can’t get Randy Newman's theme
song,“You’ve Got a Friend in Me,” out of my head!
It's also got me thinking about the nature of friendship, and
how "best friends forever" are supposed to always be there when you
need them. And, not to get too maudlin... Continue Reading

The
current Ebola outbreak in Guinea, Liberia, and Sierra Leone is
making headlines around the world, and rightfully so: it's a
frightening disease, and last week the World Health Organization
reported its spread is outpacing their response. Nearly 900
of the more than 1,600 people infected during this outbreak
have died, including some leading medical professionals trying to
stanch the... Continue Reading

There’s
a lot going on in the world, so you might not have noticed that the
Organization for Economic Development (OECD) released their new set
of health statistics for member nations. On the OECD website, you
can now download the free data series for 2014. (Be aware that “for
2014” means that the organization has a pretty good idea about what
happened in 2012.)
Of course, there’s nothing more fun... Continue Reading

We received the following question via social media
recently:
I am using Minitab 17 for ANOVA.
I calculated the mean and standard deviation for these 15 values,
but the standard deviation is very high. If I delete some values, I
can reduce the standard deviation. Is there an option in Minitab
that will automatically indicate values that are out of range and
delete them so that the standard... Continue Reading

Previously,
I’ve written about when to choose nonlinear regression and
how to model curvature with both linear and
nonlinear regression. Since then, I’ve received several
comments expressing confusion about what differentiates nonlinear
equations from linear equations. This confusion is understandable
because both types can model curves.
So, if it’s not the ability to model a curve, what isthe... Continue Reading

The
2014 World Cup has gotten off to a high-scoring start. Through the
first week of the tournament, an average of 2.9 goals have been
scored per game, the highest since 1970. And if that average climbs
to over 3 goals per game, this’ll be the highest scoring World Cup
since 1958!
So is this year’s World Cup actually bucking a trend of the low
scoring tournaments that came before it, or can we... Continue Reading

There is more than just the p value in a probability plot—the
overall graphical pattern also provides a great deal of useful
information. Probability plots are a powerful tool to better
understand your data.
In this post, I intend to present the main principles of
probability plots and focus on their visual interpretation using
some real data.
In probability plots, the data density distribution... Continue Reading

In regression analysis, you'd like your regression model to have
significant variables and to produce a high R-squared value. This
low P value / high R2 combination indicates that changes
in the predictors are related to changes in the response variable
and that your model explains a lot of the response variability.
This combination seems to go together naturally. But what if
your regression model... Continue Reading

In Minitab, the Assistant menu is your interactive guide to choosing
the right tool, analyzing data correctly, and interpreting the
results. If you’re feeling a bit rusty with choosing and using a
particular analysis, the Assistant is your friend!
Previously, I’ve written about the new linear model features in Minitab 17. In
this post, I’ll work through a multiple regression analysis example
and... Continue Reading

Last time I posted, I showed you
how to divide a data set into training and validation samples in
Minitab with the promise that next time I would show you a way
to use the validation sample. Regression is a good analysis for
this, because a validation data set can help you to verify that
you’ve selected the best model. I’m going to use a hypothetical
example so that you can see how it works when... Continue Reading

The P
value is used all over statistics, from t-tests to regression analysis. Everyone knows that you
use P values to determine statistical significance in a hypothesis test. In fact, P values often
determine what studies get published and what projects get
funding.
Despite being so important, the P value is a slippery concept
that people often interpret incorrectly. How do you
interpret P values?
In... Continue Reading