What Are the Effects of Multicollinearity and When Can I Ignore Them?

Multicollinearity is problem that you can run into when you’re fitting a regression model, or other linear model. It refers to predictors that are correlated with other predictors in the model. Unfortunately, the effects of multicollinearity can feel murky and intangible, which makes it unclear whether it’s important to fix.

My goal in this blog post is to bring the effects of multicollinearity to life with real data! Along the way, I’ll show you a simple tool that can remove multicollinearity in some cases.


 My goal in this blog post is to bring multicollinearity to life with real data about...

When Should I Use Confidence Intervals, Prediction Intervals, and Tolerance Intervals

In statistics, we use a variety of intervals to characterize the results. The most well-known of these are confidence intervals. However, confidence intervals are not always appropriate. In this post, we’ll take a look at the different types of intervals that are available in Minitab, their characteristics, and when you should use them.

I’ll cover confidence intervals, prediction intervals, and tolerance intervals. Because tolerance intervals are the least-known, I’ll devote extra time to explaining how they work and when you’d want to use them.

What are Confidence Intervals?

A confidence...

What Makes Great Presidents and Good Models?

If the title of this post made you think you’d be reading about Abraham Lincoln and Tyra Banks, you’re only half right. 

A few weeks ago, statistician and journalist Nate Silver published an interesting post  on how U.S. presidents are ranked by historians. Silver showed that the percentage of electoral votes that a U.S. president receives in his 2nd term election serves as a rough predictor of his average ranking of greatness.

Here’s the model he came up with, which I’ve duplicated in Minitab using the scatterplot with regression and groups (Graph > Scatterplot ):

Silver divided the data into...

Violations of the Assumptions for Linear Regression: Closing Arguments and Verdict

  Lionel Loosefit has been hauled to court for violating the assumptions of regression analysis. On the last day of the trial, the prosecution and defense present their closing arguments. And the fate of Mr. Loosefit is decided by judge and jury...

The Prosecution's Summary

Prosecutor: Ladies and gentlemen, we’ve presented a slew of evidence in this trial. You’ve seen, with your own eyes, every possible heinous violation of the assumptions for regression in the defendant’s model. Here’s what we’ve shown, in a nutshell:

Prosecutor: We’ve carefully delineated each violation with specific graphic...

Violations of the Assumptions for Linear Regression: Residuals versus the Fits (Day 3)

Lionel Loosefit has been hauled to court for violating the assumptions of linear regression. On Day 3 of the trial, the court examines the allegation that the residuals in Mr. Loosefit's model exhibit nonconstant variance. The defendant’s mother, Mrs. Lottie Loosefit, has taken the stand on behalf of her son.

Defense Attorney: So, Mrs. Loosefit, from what you’ve described to us, your son, Lionel, appears to have been a model child.

Lottie Loosefit [eyes watering]: He was every mother’s dream. He brushed his teeth every morning and every night, made his bed, folded his socks, picked up all his...

Orthogonal Regression: Testing the Equivalence of Instruments

I recently got a request from one of our Facebook fans to do a post about orthogonal regression, which I admit is not a subject I’m very familiar with. However, with a little help from Minitab’s help resources and by consulting a few Minitab experts, I think I came up with a post that will be useful. I thought it would help to discuss orthogonal regression with an example, but first...

What the Heck Is Orthogonal Regression?

Orthogonal regression is also known as “Deming regression” and examines the linear relationship between two continuous variables. It’s often used to test whether two...

Violations of the Assumptions for Linear Regression (Day 2): Independence of the Residuals

Recap: Lionel Loosefit has been arrested and hauled to court for violating the assumptions of regression analysis. In the previous court session, the prosecution presented evidence to show that the errors in Mr. Loosefit’s model were not normally distributed. Today, the prosecution addresses the second alleged violation: namely, that the errors in the defendant’s regression model are not independent. Dr. Minnie Tabber, a world-renowned statistician, is on the witness stand.

Prosecutor: Let me remind the members of the jury that a residual is simply the difference between the data value...

Violations of the Assumptions for Linear Regression: The Trial of Lionel Loosefit (Day 1)

Bailiff: All Rise. The Honorable Judge Lynn E. R. Peramutter presiding.

Judge: Please be seated. Bailiff, please read the charges.

Bailiff: Your honor, this is the case of the State vs. Lionel Loosefit. The defendant is charged with creating a model that violated the legal requirements for regression. The infractions include:

  • Producing grossly nonnormal errors
  • Producing errors that lack independence
  • Exhibiting nonconstant variance
  • Violating the linearity assumption

Judge: Thank you, bailiff. Let’s hear the opening statement by the prosecutor.

Prosecutor: Your honor, ladies and gentlemen of the jury....

My Work with Minitab

The Minitab Fan section of the Minitab blog is your chance to share with our readers! We always love to hear how you are using Minitab products for quality improvement projects, Lean Six Sigma initiatives, research and data analysis, and more. If our software has helped you, please share your Minitab story, too!

Throughout my 15 years as a Six Sigma Initiative Leader, Consultant, Trainer, Black Belt and master Black Belt I have been enthusiastic about Minitab Statistical Software, starting from release 11 to now.

Minitab's graphics are outstanding in their ability to present messages and...

Giving Thanks for the Regression Menu

Juicy, butter roasted turkey.

Steaming mashed potatoes.

Tangy cranberry relish.

Delicious candied sweet potatoes.

Creamy green bean casserole.

Sweet and airy corn bread.

Silken pumpkin pie.

The traditional Thanksgiving menu has so many mouth-watering dishes on the table, you don’t know where to start.

If you savor statistics as much as food, you might feel similarly as you gaze at all of the delicious analyses on Minitab’s Regression menu:

How can you decide which regression analysis to choose? In this post, I’ll give you some bite-sized samples of each regression dish to help you decide which one to...

A Tale of a Suspicious Regression Coefficient

In my last post, I used U.S. Census Bureau data and correlation to reveal that people living in colder, more crowded states typically make more money. I’m sure there is some rationale to support this conclusion, but I’ll leave that explanation up to the economists. Meanwhile, let's you and me move on to more statistics…

Once I discovered there was correlation between income and other census data, I decided to use Minitab Statistical Software to find a regression model and further examine the relationship. Easy, right?

That’s what I thought, too.

In fact, I wasn't even planning on writing about...

Using Statistics to Analyze Words: Digging Deeper

In my last blog, I showed how it’s possible to statistically assess the structure of a message and determine its capacity to convey information. We saw how my own words fit the patterns that are present in communications that are optimized for conveying information. However, these were fairly rough assessments to illustrate the fundamentals of information theory. 

In this post, I’ll use more sophisticated analyses to more precisely determine whether my blog content fits the ideal distribution. Along the way, we’ll have some interesting discussions about the vagaries of dolphin, human,...

Testing for Normality: A Tale of Two Samples by Anderson-Darling

With apologies to Charles Dickens, I'd like to begin this post by summing up the Anderson-Darling statistic this way:

It was the best of fits, it was the worst of fits, it was the test of normality, it was the test for non-normality, it was the plot of belief, it was the plot of incredulity, it was the p-value of Light, it was the p-value of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us...

I read and participate in discussions about a broad range of statistical topics daily, and few elicit as much misinformation combined...

What to Do When…Gulp…Data Analysis Isn’t an Option…

I’m a quality engineer, so it probably goes without saying that I like gathering and analyzing data. Minitab and I spend a significant amount of time together. Some might say that our relationship is unhealthy—perhaps even co-dependent. But Minitab and I have been together for almost 20 years. That’s a long time for any relationship, especially one between a software application and an engineer.

It’s that 20-year bond that makes it very difficult for me to acknowledge when I can’t gather data for a particular attribute of quality. Sometimes it’s not an option. Sometimes it’s just not the best...

Statistical Model Predicts The Hunger Games DVD Sales. Real or Not Real?

If you're one of the gazillion people who have read The Hunger Games, then you’re quite familiar with “Real or Not Real?” And if you haven't read it, I'm guessing you've at least heard about this best-selling trilogy.

The Hunger Games movie, like the book, has been a huge success, grossing over $400 million domestically. I recently saw an ad for the DVD to be released on August 18 and it got me thinking. Could I use statistics to predict DVD sales? If my model below is as good as it looks, then the answer is "yes."

Or should I say, “real”?

Creating a Statistical Model: Where to Begin?

I rarely...

Men’s 100m Dash: How have the times changed over the years?

As the 2012 Summer Olympics are now on in full “speed,” one of my favorite events to follow is the 100m dash. I’m a runner myself - although certainly not at the incredible caliber of Olympic athletes - and I’ve often thought about how running and sprinting has changed in the last 100+ years since the modern Olympic Games started.

For instance, the USA has dominated sprinting in the last 100 years, having the majority of 100m male medalists. However, Usain Bolt – a sprinter from the tiny country of Jamaica – has been in the news recently for his record-breaking time of 9.69 seconds, which was...

Statistical Tools Help Unifi Put the Shirt on Your Back

I recently had the opportunity to learn about how synthetic yarn manufacturer Unifi Manufacturing Inc. used Minitab to optimize its false-twist texturing process. Many of you probably have several connections to Unifi yarns that you just don’t know about!

In fact, you've probably found yourself wearing clothing made with Unifi yarns, or sitting on a couch made of stain-resistant fabric that was upholstered with Unifi yarn technologies.

What’s neat about Unifi products is that the company’s manufacturing processes turn raw and recycled materials and fibers into synthetic yarns that behave like...

How to Predict with Minitab: Using BMI to Predict the Body Fat Percentage, Part 1

Wouldn’t it be nice to be able to predict something that is important to you?  Sure—and it would be extra nice if you knew how accurate your predictions will be. Well, you don’t have to be a psychic to have these powers because Minitab Statistical Software gives them to you! You can find these predictive powers in regression, general linear model (ANOVA), and the design of experiments (DOE).

Prediction with Regression Analysis

We’ll explore prediction with regression analysis by using a person’s body mass index (BMI) to predict their percentage of body fat. For extra fun, we’ll compare Minitab’s...

Data Analysis and the Mystery of the Confounded Calcium

In my previous blog post, I showed how omitting a confounding predictor from a linear regression model obscured the significance of another predictor variable. Confounding variables can be insidious because you don’t always know about them, and you may have to deduce their existence. 

In that vein, this post is like a mystery story. I’ll set up the mystery and include the clues. You put on your Sherlock Holmes cap, use your knowledge of confounding variables, and see if you can come up with your own theories about how one or more confounding predictors are most likely involved. 

The Study

For...

Patterns in Your Data: How To Win Millions of Dollars Instantly?

If you’re a quality improvement expert, you already know that statistics can be a powerful tool in your quest to reduce the cost of poor quality and save money. But statistics might be even more powerful than you think—it may actually have helped someone win a lottery jackpot.

According to Business Insider, a former mathematics professor with a Ph.D. in statistics has won the Texas lottery four times, with multi-million dollar payouts each time. The article speculates whether she’s super lucky, or whether she’s used public information and her knowledge of statistics to identify patterns that...