dcsimg
 

Violations of the Assumptions for Linear Regression: Residuals versus the Fits (Day 3)

day 3

Lionel Loosefit has been hauled to court for violating the assumptions of linear regression. On Day 3 of the trial, the court examines the allegation that the residuals in Mr. Loosefit's model exhibit nonconstant variance. The defendant’s mother, Mrs. Lottie Loosefit, has taken the stand on behalf of her son.

Defense Attorney: So, Mrs. Loosefit, from what you’ve described to us, your son, Lionel, appears to have been a model child.

Lottie Loosefit [eyes watering]: He was every mother’s dream. He brushed his teeth every morning and every night, made his bed, folded his socks, picked up all his toys…

Defense Attorney: It sounds like he satisfied all of your assumptions.

Lottie Loosefit: Every single one. Would you like to see more baby pictures?

Defense Attorney: Thank you, Mrs. Loosefit, I think the evidence you’ve given is sufficient. No further questions, your Honor.

Judge [to prosecutor]: You may cross-examine the witness.

Prosecutor: A mother’s love is always touching, Mrs. Loosefit. So unconditional. You must have raised your son well. Certainly you provided for him well. You run a successful business, don’t you Mrs. Loosefit?

Lottie Loosefit: I certainly do. I started out as a seamstress. Today I am sole owner and CEO of Loosefit Garments, Inc.

Prosecutor: Kudos to you. We all have a pair of Loosefits in our closet. “If you can’t bend over backwards, it’s not a Loosefit.” Catchy jingle.

Lottie Loosefit: Thank you. You don’t get to the top sitting on your bottom.

Prosecutor: Mrs. Loosefit, suppose you’re making a sleeve for a customer. You wouldn’t want the material to be extremely tight at the elbows, yet sag and droop at other parts of the sleeve, would you?

Lottie Loosefit: Of course not! I’d be a laughingstock.

Prosecutor: In that case, without having any background in statistics, you can understand how a regression model should fit. Take a look at the illustration below:

resids andfits

Prosecutor: Here, the blue line shows the regression model. The red circles are the observed values in the data set. The black squares show the fitted value estimated by the model for each observed value.

Lottie Loosefit: I wouldn’t know about all that rigamarole.

Prosecutor: No, Mrs. Loosefit, why would you? But think of it this way. Just as a sleeve should fit the arm consistently, the distance of the observed values from the fitted values of the model--the dotted lines--should vary consistently across the fitted values of the model.

Lottie Loosefit: Our clothes fit all our models very well, thank you.

Prosecutor: I’m sure they do. You probably use a measuring tape to make sure the fit is consistent, don’t you? But in Minitab, you use a plot of the residuals vs the fitted values. If the model errors vary consistently across the fitted values, as they should, the plot will look something like this:

resids vs fits

Lottie Loosefit: Hmppph. Who ever laid eyes on an arm that bony and that straight? And I'm no fan of red polka dots either.

Prosecutor: Mrs. Loosefit, do you ever get complaints from customers about fit?

Lottie Loosefit: Cripes, yes. “Oh, this is too puffy at the chest.” “Oh, this is too tight in the rear.” “Oh this” and “Oh that!” People whine if their ice cream is cold, if you ask me.

Prosecutor: We can all sympathize, Mrs. Loosefit. Total customer satisfaction is no cakewalk, after all. Unfortunately, when it comes to the assumption of constant variance in regression, several potential problems with fit can arise as well:

fit problems

Lottie Loosefit: So what? You go looking for problems, you’ll find ‘em! What’s that got to do with my Lionel?

Prosecutor: Well, Mrs. Loosefit, take a look at Exhibit F, the plot of the residuals vs fits for your son’s regression model.

Exhibit F

Ph.D. statistician [shouts from gallery]:  Monster! Heteroscedastic fiend!

Judge: Order!

Prosecutor: Who can blame such righteous indignation? The plot shows a pathetic mishmash of all the fit problems that we just saw, doesn’t it? Somehow your son managed to break every rule in the book, Mrs. Loosefit. And the small data set he used only served to amplify the hideous nature of his transgressions.

Lottie Loosefit [turns to jury box]: My Lionel never done these awful things! I know my boy. This is somebody else’s doing!

Prosecutor: Sorry, Mrs. Loosefit. We’ve clicked the History folder in your son’s Minitab project. The folder automatically records all the commands a user runs in each Minitab session. The evidence is irrefutable:

history

Prosecutor [smiles and turns to defendant]: You didn’t know about the History folder, did you Mr. Loosefit? You thought you didn’t leave a trail. But in fact, if we wanted to, we could actually use the commands in that folder to recreate your crime, over and over again. Not that that's something we’d ever want to automate.

[Walks over to jury box]

Prosecutor: Notice ladies and gentlemen, that in the History window, you don't see a command to display graphs with the regression analysis. That proves that the defendant didn’t even bother to display residual plots to check the assumptions of the analysis!

Lottie Loosefit: No, you’re wrong! Lionel was at home with me the whole time it happened—”

Prosecutor: Nice try, Mrs. Loosefit. But we’ve also got a record of the time and date when the crimes occurred, at the top of the Session window in your son’s Minitab project.

session window

Prosecutor: According to witnesses, you were on the floor of the Loosefit Garment factory at that date and time. Also note that the Session window shows that the Minitab project was checked out directly from your son’s desktop.

[Oohs and ahs emanating from the gallery]

Spectator 1 [whispers]: Things ain’t lookin’ none too good for Loosefit…

Spectator 2 [whispers]: Aye, not even a Box-Cox transformation can save him now…

Next time: The dramatic conclusion of the trial, as the jury delivers its verdict.

 

Comments

Name: Alex • Monday, February 4, 2013

Hi, Patrick! Looking through the Diagnostic Report, generated by MINITAB Assistent for Regression, one can see Examples of patterns that may indicate problems with the fit of the model: Unequal variation, Clusters, Strong curvature, Large residuals. So why don't you use the same language in your post? Does it make sense? What about brushing (deleting) extremely large residuals?


Name: Patrick • Monday, February 4, 2013

Excellent points, Alex! You are absolutely right that those phrases in the Assistant Diagnostic Report (which is a coming attraction in Day 4 of the trial, by the way!) refer to these same problems with the residuals. One thing that I sometimes like to do in my posts is to introduce the formal statistical technical terms used in the field and then show, by example or explanation or metaphor, that they're really not as complex or scary as they sound. (Check out my earlier post on Homoscedasticity: http://blog.minitab.com/blog/statistics-and-quality-data-analysis/dont-be-a-victim-of-statistical-hippopotomonstrosesquipedaliophobia

As you astutely observed, the Minitab Assistant reports are deliberately phrased to avoid the commonly used technical jargon in the field, which is one reason they're so popular and user-friendy.

And I'm glad you brought up brushing outliers. Readers who don't know about this feature should check it out A great example of brushing is given in this blog post by Minitab trainer Redouane Kouiden:
http://blog.minitab.com/blog/statistics-for-lean-six-sigma/the-great-minitab-mpg-project-for-those


Thanks for reading and commenting!


blog comments powered by Disqus