Regression Analysis

Blog posts and articles about regression analysis techniques applied to Lean and Six Sigma quality improvement projects.

In regression, "sums of squares" are used to represent variation. In this post, we’ll use some sample data to walk through these calculations. The sample data used in this post is available within Minitab by choosing Help > Sample Data, or File > Open Worksheet > Look in Minitab Sample Data folder (depending on your version of Minitab).  The dataset is called ResearcherSalary.MTW, and contains data... Continue Reading
You need to consider many factors when you’re buying a used car. Once you narrow your choice down to a particular car model, you can get a wealth of information about individual cars on the market through the Internet. How do you navigate through it all to find the best deal?  By analyzing the data you have available.   Let's look at how this works using the Assistant in Minitab 17. With the... Continue Reading

Minitab Insights Conference 2016

September 12-13 in Philadelphia, Pa.

Two days of learning. 40+ strategic and practical sessions led by
the quality industry's most accomplished pros. Only $650.

View the program >
In my last post, we took the red pill and dove deep into the unarguably fascinating and uncompromisingly compelling world of the matrix plot. I've stuffed this post with information about a topic of marginal interest...the marginal plot. Margins are important. Back in my English composition days, I recall that margins were particularly prized for the inverse linear relationship they maintained with... Continue Reading
Technology is very much part of our lives nowadays. We use our smartphones to have video calls with our friends and family, and watch our favourite TV shows on tablets. Technology has also transformed the fitness industry with the increasing popularity of fitness trackers. Recently, I got myself a fitness watch and it's becoming my favourite gadget. It can track how many steps I’ve taken, my... Continue Reading
Suppose you’ve collected data on cycle time, revenue, the dimension of a manufactured part, or some other metric that’s important to you, and you want to see what other variables may be related to it. Now what? When I graduated from college with my first statistics degree, my diploma was bona fide proof that I'd endured hours and hours of classroom lectures on various statistical topics, including l... Continue Reading
For one reason or another, the response variable in a regression analysis might not satisfy one or more of the assumptions of ordinary least squares regression. The residuals might follow a skewed distribution or the residuals might curve as the predictions increase. A common solution when problems arise with the assumptions of ordinary least squares regression is to transform the response... Continue Reading
I’ve written about R-squared before and I’ve concluded that it’s not as intuitive as it seems at first glance. It can be a misleading statistic because a high R-squared is not always good and a low R-squared is not always bad. I’ve even said that R-squared is overrated and that the standard error of the estimate (S) can be more useful. Even though I haven’t always been enthusiastic about... Continue Reading
When running a binary logistic regression and many other analyses in Minitab, we estimate parameters for a specified model based on the sample data that has been collected. Most of the time, we use what is called Maximum Likelihood Estimation. However, based on specifics within your data, sometimes these estimation methods fail. What happens then? Specifically, during binary logistic regression, an... Continue Reading
What is an interaction? It’s when the effect of one factor depends on the level of another factor. Interactions are important when you’re performing ANOVA, DOE, or a regression analysis. Without them, your model may be missing an important term that helps explain variability in the response! For example, let’s consider 3-point shooting in the NBA. We previously saw that the number of 3-point... Continue Reading
In statistics, there are things you need to do so you can trust your results. For example, you should check the sample size, the assumptions of the analysis, and so on. In regression analysis, I always urge people to check their residual plots. In this blog post, I present one more thing you should do so you can trust your regression results in certain circumstances—standardize the continuous... Continue Reading
In the world of linear models, a hierarchical model contains all lower-order terms that comprise the higher-order terms that also appear in the model. For example, a model that includes the interaction term A*B*C is hierarchical if it includes these terms: A, B, C, A*B, A*C, and B*C. Fitting the correct regression model can be as much of an art as it is a science. Consequently, there's not always a... Continue Reading
If you perform linear regression analysis, you might need to compare different regression lines to see if their constants and slope coefficients are different. Imagine there is an established relationship between X and Y. Now, suppose you want to determine whether that relationship has changed. Perhaps there is a new context, process, or some other qualitative change, and you want to determine... Continue Reading
Did you ever wonder why statistical analyses and concepts often have such weird, cryptic names? One conspiracy theory points to the workings of a secret committee called the ICSSNN. The International Committee for Sadistic Statistical Nomenclature and Numerophobia was formed solely to befuddle and subjugate the masses. Its mission: To select the most awkward, obscure, and confusing name possible... Continue Reading
By Matthew Barsalou, guest blogger A problem must be understood before it can be properly addressed. A thorough understanding of the problem is critical when performing a root cause analysis (RCA) and an RCA is necessary if an organization wants to implement corrective actions that truly address the root cause of the problem. An RCA may also be necessary for process improvement projects; it is... Continue Reading
As Halloween approaches, you are probably taking the necessary steps to protect yourself from the various ghosts, goblins, and witches that are prowling around. Monsters of all sorts are out to get you, unless they’re sufficiently bribed with candy offerings! I’m here to warn you about a ghoul that all statisticians and data scientists need to be aware of: phantom degrees of freedom. These phantoms... Continue Reading
With Speaker John Boehner resigning, Kevin McCarthy quitting before the vote for him to be Speaker, and a possible government shutdown in the works, the Freedom Caucus has certainly been in the news frequently! Depending on your political bent, the Freedom Caucus has caused quite a disruption for either good or bad.  Who are these politicians? The Freedom Caucus is a group of approximately 40... Continue Reading
I was recently asked a couple of questions about stability studies in Minitab. Question 1:  If you enter in a lower and upper spec in the Stability Study dialog window, why do I see only one confidence bound per fitted line on the resulting graph? Shouldn’t there be two? You use a stability study to analyze the stability of a product over time and to determine the product's shelf life. In order to... Continue Reading
I recently guest lectured for an applied regression analysis course at Penn State. Now, before you begin making certain assumptions—because as any statistician will tell you, assumptions are important in regression—you should know that I have no teaching experience whatsoever, and I’m not much older than the students I addressed. I’m just 5 years removed from my undergraduate days at Virginia Tech,... Continue Reading
My previous post showed an example of using ordinary linear regression to model a count response. For that particular count data, shown by the blue circles on the dot plot below, the model assumptions for linear regression were adequately satisfied. But frequently, count data may contain many values equal or close to 0. Also, the distribution of the counts may be right-skewed. In the quality field,... Continue Reading
Ever use dental floss to cut soft cheese? Or Alka Seltzer to clean your toilet bowl? You can find a host of nonconventional uses for ordinary objects online. Some are more peculiar than others. Ever use ordinary linear regression to evaluate a response (outcome) variable of counts?  Technically, ordinary linear regression was designed to evaluate a a continuous response variable. A continuous... Continue Reading