dcsimg
 

Linear Regression

Blog posts and articles about the statistical method called Linear Regression and its use in real-world quality projects.

For one reason or another, the response variable in a regression analysis might not satisfy one or more of the assumptions of ordinary least squares regression. The residuals might follow a skewed distribution or the residuals might curve as the predictions increase. A common solution when problems arise with the assumptions of ordinary least squares regression is to transform the response... Continue Reading
In my last post, I looked at viewership data for the five seasons of HBO’s hit series Game of Thrones. I created a time series plot in Minitab that showed how viewership rose season by season, and how it varied episode by episode within each season.   My next step is to fit a statistical model to the data, which I hope will allow me to predict the viewing numbers for future episodes.    I am going to... Continue Reading

7 Deadly Statistical Sins Even the Experts Make

Do you know how to avoid them?

Get the facts >
In this post, I’ll address some common questions we’ve received in technical support about the difference between fitted and data means, where to find each option within Minitab, and how Minitab calculates each. First, let’s look at some definitions. It’s useful to have an example, so I’ll be using the Light Output data set from Minitab’s Data Set Library, which includes a description of the sample... Continue Reading
In the world of linear models, a hierarchical model contains all lower-order terms that comprise the higher-order terms that also appear in the model. For example, a model that includes the interaction term A*B*C is hierarchical if it includes these terms: A, B, C, A*B, A*C, and B*C. Fitting the correct regression model can be as much of an art as it is a science. Consequently, there's not always a... Continue Reading
How deeply has statistical content from Minitab blog posts (or other sources) seeped into your brain tissue? Rather than submit a biopsy specimen from your temporal lobe for analysis, take this short quiz to find out. Each question may have more than one correct answer. Good luck! Which of the following are famous figure skating pairs, and which are methods for testing whether your data follow a... Continue Reading
If you perform linear regression analysis, you might need to compare different regression lines to see if their constants and slope coefficients are different. Imagine there is an established relationship between X and Y. Now, suppose you want to determine whether that relationship has changed. Perhaps there is a new context, process, or some other qualitative change, and you want to determine... Continue Reading
With Speaker John Boehner resigning, Kevin McCarthy quitting before the vote for him to be Speaker, and a possible government shutdown in the works, the Freedom Caucus has certainly been in the news frequently! Depending on your political bent, the Freedom Caucus has caused quite a disruption for either good or bad.  Who are these politicians? The Freedom Caucus is a group of approximately 40... Continue Reading
Step 3 in our DOE problem solving methodology is to determine how many times to replicate the base experiment plan. The discussion in Part 3 ended with the conclusion that our 4 factors could best be studied using all 16 combinations of the high and low settings for each factor, a full factorial. Each golfer will perform half of the sixteen possible combinations and each golfer’s data could stand as... Continue Reading
Step 2 in our DOE problem-solving methodology is to design the data collection plan you will use to study the factors in your experiment. Of course, you will have to incorporate blocking and covariates in your experiment design, as well as calculate the number of replications of run conditions needed in order to be confident in your results. We will address these topics in future posts, but for... Continue Reading
If you use ordinary linear regression with a response of count data, if may work out fine (Part 1), or you may run into some problems (Part 2). Given that a count response could be problematic, why not use a regression procedure developed to handle a response of counts? A Poisson regression analysis is designed to analyze a regression model with a count response. First, let's try using Poisson... Continue Reading
My previous post showed an example of using ordinary linear regression to model a count response. For that particular count data, shown by the blue circles on the dot plot below, the model assumptions for linear regression were adequately satisfied. But frequently, count data may contain many values equal or close to 0. Also, the distribution of the counts may be right-skewed. In the quality field,... Continue Reading
Ever use dental floss to cut soft cheese? Or Alka Seltzer to clean your toilet bowl? You can find a host of nonconventional uses for ordinary objects online. Some are more peculiar than others. Ever use ordinary linear regression to evaluate a response (outcome) variable of counts?  Technically, ordinary linear regression was designed to evaluate a a continuous response variable. A continuous... Continue Reading
In regression analysis, overfitting a model is a real problem. An overfit model can cause the regression coefficients, p-values, and R-squared to be misleading. In this post, I explain what an overfit model is and how to detect and avoid this problem. An overfit model is one that is too complicated for your data set. When this happens, the regression model becomes tailored to fit the quirks and... Continue Reading
Previously, I’ve written about how to interpret regression coefficients and their individual P values. I’ve also written about how to interpret R-squared to assess the strength of the relationship between your model and the response variable. Recently I've been asked, how does the F-test of the overall significance and its P value fit in with these other statistics? That’s the topic of this post! In... Continue Reading
In Minitab Statistical Software, putting a regression line on a scatterplot is as easy as choosing a picture with a regression line on a scatterplot: A neat trick is that you can also add calculated lines onto a scatterplot for comparison or other communication purposes. Here’s a demonstration. United States Sentencing Guidelines The United States Sentencing Guidelines say how people who... Continue Reading
In my previous post, I showed you that the coefficients are different when choosing (-1,0,1) vs (1,0) coding schemes for General Linear Model (or Regression).  We used the two different equations to calculate the same fitted values. Here I will focus on showing what the different coefficients represent.  Let's use the data and models from the last blog post: We can display the means for each level... Continue Reading
Since Minitab 17 Statistical Software launched in February 2014, we've gotten great feedback from many people have been using the General Linear Model and Regression tools. But in speaking with people as part of Minitab's Technical Support team, I've found many are noticing that there are two coding schemes available with each. We frequently get calls from people asking how the coding scheme you... Continue Reading
By Erwin Gijzen, Guest Blogger In my previous post, we assessed the out-of-spec level for a process with capability analysis and visualized process variability using a control chart. Our goal is to reduce variability, but when a process has a multitude of categorical and continuous variables, identifying root causes can be a huge challenge. Analyzing covariance—using the statistical technique... Continue Reading
by Erwin Gijzen, Guest Blogger People who work in quality improvement know that the root causes of quality issues are hard to find. A typical production process can contain hundreds of potential causes. Additionally, companies often produce products with multiple quality requirements, such as dimensions, surface appearance, and impact resistance. With so many variables, it’s no wonder many companies... Continue Reading
We’ve been pretty excited about March Madness here at Minitab. Kevin Rudy’s been busy creating his regression model and predicting the winners for the 2015 NCAA Men’s Basketball Tournament. But we’re not the only ones. Lots of folks are doing their best analysis to help you plan out your bracket now that the tip-offs for the round of 64 are just a day away. As you ponder your last-minute changes,... Continue Reading