dcsimg
 

Linear Regression

Blog posts and articles about the statistical method called Linear Regression and its use in real-world quality projects.

Using a sample to estimate the properties of an entire population is common practice in statistics. For example, the mean from a random sample estimates that parameter for an entire population. In linear regression analysis, we’re used to the idea that the regression coefficients are estimates of the true parameters. However, it’s easy to forget that R-squared (R2) is also an estimate.... Continue Reading
I’ve written about the importance of checking your residual plots when performing linear regression analysis. If you don’t satisfy the assumptions for an analysis, you might not be able to trust the results. One of the assumptions for regression analysis is that the residuals are normally distributed. Typically, you assess this assumption using the normal probability plot of the residuals. Are... Continue Reading
Previously, I showed why there is no R-squared for nonlinear regression. Anyone who uses nonlinear regression will also notice that there are no P values for the predictor variables. What’s going on? Just like there are good reasons not to calculate R-squared for nonlinear regression, there are also good reasons not to calculate P values for the coefficients. Why not—and what to use instead—are the... Continue Reading
Previously, I’ve written about when to choose nonlinear regression and how to model curvature with both linear and nonlinear regression. Since then, I’ve received several comments expressing confusion about what differentiates nonlinear equations from linear equations. This confusion is understandable because both types can model curves. So, if it’s not the ability to model a curve, what isthe... Continue Reading
There is more than just the p value in a probability plot—the overall graphical pattern also provides a great deal of useful information. Probability plots are a powerful tool to better understand your data. In this post, I intend to present the main principles of probability plots and focus on their visual interpretation using some real data. In probability plots, the data density distribution... Continue Reading
In regression analysis, you'd like your regression model to have significant variables and to produce a high R-squared value. This low P value / high R2 combination indicates that changes in the predictors are related to changes in the response variable and that your model explains a lot of the response variability. This combination seems to go together naturally. But what if your regression model... Continue Reading
You know what really gets on my nerves? A lot of things. That slow, slinky way that cats walk by. Grrrr. The rude, abrupt arrival of delivery persons in their obnoxiously loud trucks. (Why do they always pull up just as I’m settling down for a nap?) Grrrr. Total strangers who reach down and poke me with fat, clumsy fingers that reek of antibacterial soap. Grrrr. And this one always gets my dander up:... Continue Reading
In Minitab, the Assistant menu is your interactive guide to choosing the right tool, analyzing data correctly, and interpreting the results. If you’re feeling a bit rusty with choosing and using a particular analysis, the Assistant is your friend! Previously, I’ve written about the new linear model features in Minitab 17. In this post, I’ll work through a multiple regression analysis example and... Continue Reading
Nonlinear regression is a very powerful analysis that can fit virtually any curve. However, it's not possible to calculate a valid R-squared for nonlinear regression. This topic gets complicated because, while Minitab statistical software doesn’t calculate R-squared for nonlinear regression, some other packages do. So, what’s going on? Minitab doesn't calculate R-squared for nonlinear models... Continue Reading
By popular demand, Release 17 of Minitab Statistical Software comes with a new graphical analysis called the Bubble Plot. This exploratory tool is great for visualizing the relationships among three variables on a single plot. To see how it works, consider the total medal count by country from the recently completed 2014 Olympic Winter Games. Suppose I want to explore whether there might be a... Continue Reading
We released Minitab 17 Statistical Software a couple of days ago. Certainly every new release of Minitab is a reason to celebrate. However, I am particularly excited about Minitab 17 from a data analyst’s perspective.  If you read my blogs regularly, you’ll know that I’ve extensively used and written about linear models. Minitab 17 has a ton of new features that expand and enhance many types of... Continue Reading
R-squared gets all of the attention when it comes to determining how well a linear model fits the data. However, I've stated previously that R-squared is overrated. Is there a different goodness-of-fit statistic that can be more helpful? You bet! Today, I’ll highlight a sorely underappreciated regression statistic: S, or the standard error of the regression. S provides important information that... Continue Reading
Just how high should R2 be in regression analysis? I hear this question asked quite frequently. Previously, I showed how to interpret R-squared (R2). I also showed how it can be a misleading statistic because a low R-squared isn’t necessarily bad and a high R-squared isn’t necessarily good. Clearly, the answer for “how high should R-squared be” is . . . it depends. In this post, I’ll help you answer... Continue Reading
I’ve written a number of blog posts about regression analysis and I think it’s helpful to collect them in this post to create a regression tutorial. I’ll supplement my own posts with some from my colleagues. This tutorial covers many aspects of regression analysis including: choosing the type of regression analysis to use, specifying the model, interpreting the results, determining how well the... Continue Reading
Face it, you love regression analysis as much as I do. Regression is one of the most satisfying analyses in Minitab: get some predictors that should have a relationship to a response, go through a model selection process, interpret fit statistics like adjusted R2 and predicted R2, and make predictions. Yes, regression really is quite wonderful. Except when it’s not. Dark, seedy corners of the data... Continue Reading
For one reason or another, the response variable in a regression analysis might not satisfy one or more of the assumptions of ordinary least squares regression. The residuals might follow a skewed distribution or the residuals might curve as the predictions increase. A common solution when problems arise with the assumptions of ordinary least squares regression is to transform the response... Continue Reading
by Laerte de Araujo Lima, guest blogger A few weeks ago, my football friends and I were talking about the football in the UEFA Champions league (UEFA CL), and what we could expect for the 2013-14 season. Some of us believe that the quality of the football played in the UEFA CL has improved in the last few years, as evidenced by more goals per match, more teams with strategies based in the attack... Continue Reading
Imagine that you’ve studied an empirical problem using linear regression analysis and have settled on a well-specified, actionable model to present to your boss. Or perhaps you’re the boss, using applied regression models to make decisions. In either case, there’s a good chance a costly mistake is about to occur! How regression results are presented can lead decision-makers to make bad choices. Emre... Continue Reading
John Tukey once said, “The best thing about being a statistician is that you get to play in everyone’s backyard.” I enthusiastically agree! I frequently enjoy reading and watching science-related material. This invariably raises questions, involving other "backyards," that I can better understand using statistics. For instance, see my post about the statistical analysis of dolphin sounds. The latest... Continue Reading
We often think of a relationship between two variables as a straight line. That is, if you increase the predictor by 1 unit, the response always increases by X units. However, not all data have a linear relationship, and your model must fit the curves present in the data. This fitted line plot shows the folly of using a line to fit a curved relationship! How do you fit a curve to your data?... Continue Reading