Linear Regression

Blog posts and articles about the statistical method called Linear Regression and its use in real-world quality projects.

If you use ordinary linear regression with a response of count data, if may work out fine (Part 1), or you may run into some problems (Part 2). Given that a count response could be problematic, why not use a regression procedure developed to handle a response of counts? A Poisson regression analysis is designed to analyze a regression model with a count response. First, let's try using Poisson... Continue Reading
My previous post showed an example of using ordinary linear regression to model a count response. For that particular count data, shown by the blue circles on the dot plot below, the model assumptions for linear regression were adequately satisfied. But frequently, count data may contain many values equal or close to 0. Also, the distribution of the counts may be right-skewed. In the quality field,... Continue Reading
Ever use dental floss to cut soft cheese? Or Alka Seltzer to clean your toilet bowl? You can find a host of nonconventional uses for ordinary objects online. Some are more peculiar than others. Ever use ordinary linear regression to evaluate a response (outcome) variable of counts?  Technically, ordinary linear regression was designed to evaluate a a continuous response variable. A continuous... Continue Reading
In regression analysis, overfitting a model is a real problem. An overfit model can cause the regression coefficients, p-values, and R-squared to be misleading. In this post, I explain what an overfit model is and how to detect and avoid this problem. An overfit model is one that is too complicated for your data set. When this happens, the regression model becomes tailored to fit the quirks and... Continue Reading
Previously, I’ve written about how to interpret regression coefficients and their individual P values. I’ve also written about how to interpret R-squared to assess the strength of the relationship between your model and the response variable. Recently I've been asked, how does the F-test of the overall significance and its P value fit in with these other statistics? That’s the topic of this post! In... Continue Reading
In Minitab Statistical Software, putting a regression line on a scatterplot is as easy as choosing a picture with a regression line on a scatterplot: A neat trick is that you can also add calculated lines onto a scatterplot for comparison or other communication purposes. Here’s a demonstration. United States Sentencing Guidelines The United States Sentencing Guidelines say how people who... Continue Reading
In my previous post, I showed you that the coefficients are different when choosing (-1,0,1) vs (1,0) coding schemes for General Linear Model (or Regression).  We used the two different equations to calculate the same fitted values. Here I will focus on showing what the different coefficients represent.  Let's use the data and models from the last blog post: We can display the means for each level... Continue Reading
Since Minitab 17 Statistical Software launched in February 2014, we've gotten great feedback from many people have been using the General Linear Model and Regression tools. But in speaking with people as part of Minitab's Technical Support team, I've found many are noticing that there are two coding schemes available with each. We frequently get calls from people asking how the coding scheme you... Continue Reading
By Erwin Gijzen, Guest Blogger In my previous post, we assessed the out-of-spec level for a process with capability analysis and visualized process variability using a control chart. Our goal is to reduce variability, but when a process has a multitude of categorical and continuous variables, identifying root causes can be a huge challenge. Analyzing covariance—using the statistical technique... Continue Reading
by Erwin Gijzen, Guest Blogger People who work in quality improvement know that the root causes of quality issues are hard to find. A typical production process can contain hundreds of potential causes. Additionally, companies often produce products with multiple quality requirements, such as dimensions, surface appearance, and impact resistance. With so many variables, it’s no wonder many companies... Continue Reading
We’ve been pretty excited about March Madness here at Minitab. Kevin Rudy’s been busy creating his regression model and predicting the winners for the 2015 NCAA Men’s Basketball Tournament. But we’re not the only ones. Lots of folks are doing their best analysis to help you plan out your bracket now that the tip-offs for the round of 64 are just a day away. As you ponder your last-minute changes,... Continue Reading
As someone who has collected and analyzed real data for a living, the idea of using simulated data for a Monte Carlo simulation sounds a bit odd. How can you improve a real product with simulated data? In this post, I’ll help you understand the methods behind Monte Carlo simulation and walk you through a simulation example using Devize. What is Devize, you ask? Devize is Minitab's exciting new,... Continue Reading
Choosing the correct linear regression model can be difficult. After all, the world and how it works is complex. Trying to model it with only a sample doesn’t make it any easier. In this post, I'll review some common statistical methods for selecting models, complications you may face, and provide some practical advice for choosing the best regression model. It starts when a researcher wants to... Continue Reading
Stepwise regression and best subsets regression are both automatic tools that help you identify useful predictors during the exploratory stages of model building for linear regression. These two procedures use different methods and present you with different output. An obvious question arises. Does one procedure pick the true model more often than the other? I’ll tackle that question in this post. Fi... Continue Reading
Using a sample to estimate the properties of an entire population is common practice in statistics. For example, the mean from a random sample estimates that parameter for an entire population. In linear regression analysis, we’re used to the idea that the regression coefficients are estimates of the true parameters. However, it’s easy to forget that R-squared (R2) is also an estimate.... Continue Reading
I’ve written about the importance of checking your residual plots when performing linear regression analysis. If you don’t satisfy the assumptions for an analysis, you might not be able to trust the results. One of the assumptions for regression analysis is that the residuals are normally distributed. Typically, you assess this assumption using the normal probability plot of the residuals. Are... Continue Reading
Previously, I showed why there is no R-squared for nonlinear regression. Anyone who uses nonlinear regression will also notice that there are no P values for the predictor variables. What’s going on? Just like there are good reasons not to calculate R-squared for nonlinear regression, there are also good reasons not to calculate P values for the coefficients. Why not—and what to use instead—are the... Continue Reading
Previously, I’ve written about when to choose nonlinear regression and how to model curvature with both linear and nonlinear regression. Since then, I’ve received several comments expressing confusion about what differentiates nonlinear equations from linear equations. This confusion is understandable because both types can model curves. So, if it’s not the ability to model a curve, what isthe... Continue Reading
There is more than just the p value in a probability plot—the overall graphical pattern also provides a great deal of useful information. Probability plots are a powerful tool to better understand your data. In this post, I intend to present the main principles of probability plots and focus on their visual interpretation using some real data. In probability plots, the data density distribution... Continue Reading
In regression analysis, you'd like your regression model to have significant variables and to produce a high R-squared value. This low P value / high R2 combination indicates that changes in the predictors are related to changes in the response variable and that your model explains a lot of the response variability. This combination seems to go together naturally. But what if your regression model... Continue Reading