# Regression Analysis

Blog posts and articles about regression analysis techniques applied to Lean and Six Sigma quality improvement projects.

Wildfires in California have killed at least 40 people and burned more than 217,000 acres in the past few weeks. Nearly 8,000 firefighters are trying to contain the blazes with the aid of more than 800 firetrucks, 70 helicopters and 30 planes. In remote areas difficult to access by firetruck, smokejumpers may be needed to parachute in to fight the fires. But danger looms before a smokejumper even... Continue Reading
Overfitting a model is a real problem you need to beware of when performing regression analysis. An overfit model result in misleading regression coefficients, p-values, and R-squared statistics. Nobody wants that, so let's examine what overfit models are, and how to avoid falling into the overfitting trap. Put simply, an overfit model is too complex for the data you're analyzing. Rather than... Continue Reading

### 7 Deadly Statistical Sins Even the Experts Make

Do you know how to avoid them?

Maybe you're just getting started with analyzing data. Maybe you're reasonably knowledgeable about statistics, but it's been a long time since you did a particular analysis and you feel a little bit rusty. In either case, the Assistant menu in Minitab Statistical Software gives you an interactive guide from start to finish. It will help you choose the right tool quickly, analyze your data... Continue Reading
In April 2017, overbooking of flight seats hit the headlines when a United Airlines customer was dragged off a flight. A TED talk by Nina Klietsch gives a good, but simplistic explanation of why overbooking is so attractive to airlines. Overbooking is not new to the airlines; these strategies were officially sanctioned by The American Civil Aeronautics Board in 1965, and since that time complex... Continue Reading
Previously, I’ve written about when to choose nonlinear regression and how to model curvature with both linear and nonlinear regression. Since then, I’ve received several comments expressing confusion about what differentiates nonlinear equations from linear equations. This confusion is understandable because both types can model curves. So, if it’s not the ability to model a curve, what isthe... Continue Reading
One of the biggest pieces of international news last year was the so-called "Brexit" referendum, in which a majority of voters in the United Kingdom cast their ballots to leave the European Union (EU). That outcome shocked the world. Follow-up media coverage has asserted that the younger generation prefers to remain in the EU since that means more opportunities on the continent. The older... Continue Reading
If you regularly perform regression analysis, you know that R2 is a statistic used to evaluate the fit of your model. You may even know the standard definition of R2: the percentage of variation in the response that is explained by the model. Fair enough. With Minitab Statistical Software doing all the heavy lifting to calculate your R2 values, that may be all you ever need to know. But if you’re... Continue Reading
Did you ever wonder why statistical analyses and concepts often have such weird, cryptic names? One conspiracy theory points to the workings of a secret committee called the ICSSNN. The International Committee for Sadistic Statistical Nomenclature and Numerophobia was formed solely to befuddle and subjugate the masses. Its mission: To select the most awkward, obscure, and confusing name possible... Continue Reading
Regardless of who you support in the upcoming U.S. election, we can all agree that it’s been a very bumpy ride! It’s been a particularly chaotic election cycle. Wouldn’t it be nice if we could peek into the future and see potential election results right now? That’s what we'll do in this post! In 2012, I used binary logistic regression to predict that President Obama would be reelected for a second... Continue Reading
Data mining can be helpful in the exploratory phase of an analysis. If you're in the early stages and you're just figuring out which predictors are potentially correlated with your response variable, data mining can help you identify candidates. However, there are problems associated with using data mining to select variables. In my previous post, we used data mining to settle on the following... Continue Reading
Face it, you love regression analysis as much as I do. Regression is one of the most satisfying analyses in Minitab: get some predictors that should have a relationship to a response, go through a model selection process, interpret fit statistics like adjusted R2 and predicted R2, and make predictions. Yes, regression really is quite wonderful. Except when it’s not. Dark, seedy corners of the data... Continue Reading
Data mining uses algorithms to explore correlations in data sets. An automated procedure sorts through large numbers of variables and includes them in the model based on statistical significance alone. No thought is given to whether the variables and the signs and magnitudes of their coefficients make theoretical sense. We tend to think of data mining in the context of big data, with its huge... Continue Reading
You’ve performed multiple linear regression and have settled on a model which contains several predictor variables that are statistically significant. At this point, it’s common to ask, “Which variable is most important?” This question is more complicated than it first appears. For one thing, how you define “most important” often depends on your subject area and goals. For another, how you collect... Continue Reading
There may be huge potential benefits waiting in the data in your servers. These data may be used for many different purposes. Better data allows better decisions, of course. Banks, insurance firms, and telecom companies already own a large amount of data about their customers. These resources are useful for building a more personal relationship with each customer. Some organizations already use... Continue Reading
In regression, "sums of squares" are used to represent variation. In this post, we’ll use some sample data to walk through these calculations. The sample data used in this post is available within Minitab by choosing Help > Sample Data, or File > Open Worksheet > Look in Minitab Sample Data folder (depending on your version of Minitab).  The dataset is called ResearcherSalary.MTW, and contains data... Continue Reading
You need to consider many factors when you’re buying a used car. Once you narrow your choice down to a particular car model, you can get a wealth of information about individual cars on the market through the Internet. How do you navigate through it all to find the best deal?  By analyzing the data you have available.   Let's look at how this works using the Assistant in Minitab Statistical... Continue Reading
In my last post, we took the red pill and dove deep into the unarguably fascinating and uncompromisingly compelling world of the matrix plot. I've stuffed this post with information about a topic of marginal interest...the marginal plot. Margins are important. Back in my English composition days, I recall that margins were particularly prized for the inverse linear relationship they maintained with... Continue Reading
Technology is very much part of our lives nowadays. We use our smartphones to have video calls with our friends and family, and watch our favourite TV shows on tablets. Technology has also transformed the fitness industry with the increasing popularity of fitness trackers. Recently, I got myself a fitness watch and it's becoming my favourite gadget. It can track how many steps I’ve taken, my... Continue Reading
Suppose you’ve collected data on cycle time, revenue, the dimension of a manufactured part, or some other metric that’s important to you, and you want to see what other variables may be related to it. Now what? When I graduated from college with my first statistics degree, my diploma was bona fide proof that I'd endured hours and hours of classroom lectures on various statistical topics, including l... Continue Reading
For one reason or another, the response variable in a regression analysis might not satisfy one or more of the assumptions of ordinary least squares regression. The residuals might follow a skewed distribution or the residuals might curve as the predictions increase. A common solution when problems arise with the assumptions of ordinary least squares regression is to transform the response... Continue Reading