Violations of the Assumptions for Linear Regression (Day 2): Independence of the Residuals

Recap: Lionel Loosefit has been arrested and hauled to court for violating the assumptions of regression analysis. In the previous court session, the prosecution presented evidence to show that the errors in Mr. Loosefit’s model were not normally distributed. Today, the prosecution addresses the second alleged violation: namely, that the errors in the defendant’s regression model are not independent. Dr. Minnie Tabber, a world-renowned statistician, is on the witness stand.

Prosecutor: Let me remind the members of the jury that a residual is simply the difference between the data value...

Monte Carlo Is Not as Difficult as You Think

Before I started studying statistics, references to a mysterious "Monte Carlo Method" made it seem like the most cryptic thing in the data-analysis universe. People were developing programs dedicated solely to Monte Carlo, and offering special workshops and seminars. It seemed so great and terrible that someone like me—mere mortal that I am—would never be able to understand it. 

Fast-forward a few years, and now that I have some experience with it, I'm wondering why Monte Carlo has the reputation it does. The fact of the matter is, at least from a data analysis perspective, Monte Carlo...

Shakespeare and Best Subsets Regression

  Orlando:  And wilt thou have me?         45

  Rosalind:  Ay, and twenty such. 

  Orlando:  What sayest thou? 

  Rosalind:  Are you not good? 

  Orlando:  I hope so. 

  Rosalind:  Why then, can one desire too much of a good thing?

William Shakespeare, As You Like It, Act IV, Scene I.

 

When looking at best subsets regression, Shakespeare’s question about whether one can desire too much of a good thing becomes immediately important. With the power of best subsets regression, you can quickly explore models with twenty such. For the gummi bear data, you could even try models with thirty-such! And as...

Holiday Baking: Using DOE to Bake a Better Cookie

It’s the most wonderful time of the year – the time for holiday bakers and cookie monsters to unite! So what’s a quality improvement professional to do when his favorite sugar cookie recipe produced cookies that failed to hold their festive holiday shapes after being baked? Run a Design of Experiment (DOE), of course!

A Fractional Factorial Experiment

Bill Howell, an avid baker and quality professional, used Minitab’s DOE tools to get to the bottom of his sugar cookie shape faux pas.

Howell planned to design an experiment that would allow him to screen many factors, determine which were most...

What’s in a Name? Holy Toledo, Apparently Everything!

College football bowl season is upon us! And to make the multitude of odd bowl games involving teams we haven’t watched all season more entertaining, I’m sure you’ve filled out a Bowl Pool or two. There are 35 Bowl Games, so that means you have to pick the winner of 35 bowl games. That’s quite a few decisions to make!

When making decisions, we usually perform some sort of data analysis to help us make the choice. A farmer might use design of experiments to determine which fertilizer yields the most growth for his crops. A dietician might use hypothesis testing to determine whether a particular...

What the Heck Is Best Subsets Regression, and Why Would I Want It?

Last time, we used stepwise regression to come up with models for the gummi bear data. Stepwise regression is a great tool, but it has a downside: when we use stepwise selection in design of experiments, especially if we focus on only the last step, we can miss interesting models that might be useful.

One way to look at more models is to use Minitab’s Best Subsets feature. Instead of identifying a single model based on statistical significance, Minitab’s Best Subsets feature shows a number of different models, as well as some statistics to help us compare those models.

To get the idea, let’s...

Does Design of Experiments Explain Contradictory Research Results?

Design of experiments, experimental design, or just "gathering some data." Whatever you want to call it, your approach to doing it will affect the results you get.

Have you ever wondered about all those contradictory studies in the news, especially regarding what's good and bad for you? Coffee is good for you, one headline says. It's bad for you, says the next. And if you read beyond the headlines, each study seems to have been conducted in a reasonable manner. 

Experimental design may be the explanation. 

Designing Experiments Begins with Questions

Science and health writer Emily Anthes recently...

Gummi Bear DOE: Stepwise Model Selection

We’ve used design of experiments to look at the data. We’ve seen that the center points are statistically insignificant. We’ve seen that blocks help account for the unstable conditions during the collection of the data. Now for the exciting part: let’s choose a model to use to predict where the gummi bears will land when we launch them.

Various criteria exist for how to choose a model, so we’re not going to settle on a single model right away. We’ll do three steps:

  • Come up with some candidate models.
  • Check for reasons to discard the candidate models.
  • Check how the models perform when we go back...

Gummi Bear DOE: What Do the Blocks Mean?

Last time I used design of experiments to look at the gummi bear data, I interpreted the center point data. The data say that I won’t need any square or cubic terms to get a good fit to the data. Traditionally, the next effect to look at in design of experiments is the block effect.

I was worried that there would be a wearout effect acting on my catapult, so I changed popsicle sticks and rubber bands periodically. I also simply didn’t have time to collect all of my data at the same time, so the blocks represent different days. Moreover, I collected the data for the third block in a...

Predicting the U.S. Presidential Election: Evaluating Two Models (Part One)

You may have read about statistical models that claim to predict the outcome of the upcoming Presidential election. It’s easy to imagine that these models are complicated and contain many demographic, sociological, economic, and political factors. However, I was surprised to read in this article that two simple models supposedly generate accurate predictions.

Both of these models use stock market data. One model is based on the Dow Jones and the other on the S&P 500. Statistics are best when they are a hands-on experience, so while neither study included the data, I obtained both the stock...

How to Be a Ghost Hunter with a Statistical Mindset

I’m very much a data, empirical, science type of guy. So, it might be a surprise to learn that I’ve gone ghost hunting a half-dozen times over the past 3 years. Now, I’m not a paranormal enthusiast. I’m definitely a skeptic. However, in my view, being skeptical about something does not preclude collecting data about it. I also have friends I trust completely who are sure they’ve experienced paranormal activity. Plus, I don’t need much of an excuse to try something new and unusual!

Three of us skeptical ghost hunters have spent the night by ourselves in a variety of supposedly haunted prisons...

Gummi Bear DOE: What Do the Center Points Show?

When I chose a full factorial design for my gummi bear experiment, I was using traditional design of experiments practice to try to learn the most from the least amount of data. I wanted to see if I could save myself the 10 or more data points I would need to add to the design to estimate nonlinear effects. Now that I have some data, the first thing I’m going to learn is: Do I need to collect more data?

I hope I don't, because I would have to go buy more gummi bears. I already ate the bears I didn’t throw away.

I talked about the role of center points in design of experiments earlier. When we...

Gummi Bear DOE: Mystery Effects

Back when I chose the factors to study for my gummi bear design of experiments, I was thinking about the fact that something like the position of the gummi bear and the position of the fulcrum would probably interact. When I finished collecting the data, I was eager to see if that effect showed up in my analysis.

Before we look at the distance parallel to the catapult, let's look at the distance perpendicular to the catapult. I didn’t change any factors with the express purpose of making the gummi bear go left or right, so I was hoping all of these factors would be statistically insignificant....

Gummi Bear DOE: First Lessons from Data Collection

I collected my first block of data for the gummi bear design of experiments this week. Why not all of it? Well, there’s lots you can learn when you start collecting data for real. Here are some of my thoughts:

Enter data quickly and accurately for design of experiments

If you’re going to do anything with your data, it’s a lot easier to have it in Minitab. If you followed my lead for doing design of experiments, you have a piece of paper that looks like this:

Accuracy will be much easier if the same person who wrote the data also enters it in the computer, so they can figure out if that number in...

Design of Experiments for an Avid Gamer

The Minitab Fan section of the Minitab blog is your chance to share with our readers! We always love to hear how you are using Minitab products for quality improvement projects, Lean Six Sigma initiatives, research and data analysis, and more. Today we learn how an avid gamer used design of experiments to boost his performance in his favorite driving game.

If our software has helped you, please share your Minitab story, too!

 

I've used Minitab Statistical Softwarepretty much every work day since being a Black Belt, but I've also used it at home. Back before the needs of family life took over,...

Optimizing Attribute Responses using Design of Experiments (DOE), Part 2

by Manikandan Jayakumar, guest blogger

In an earlier post, I discussed how to collect data in a Design of Experiments (DOE) to optimize the value of an attribute or categorical response (Pass/Fail, Accept/Reject, etc.).  I then showed how to convert the collected data into proportions and apply the arcsine transformation using built-in calculator in Minitab Statistical Software.    Now we’re ready to analyze the data to see what effect our three factors have on our attribute response! 

Initial Model and Interaction Plot of Attribute DOE Results

We’ll do this by choosing Stat > DOE > Factorial...

Gummi Bear DOE: Printing a Data Collection Sheet

Recently, we’ve discussed how to do the design and factor setup for design of experiments in Minitab Statistical Software. We’re almost ready to launch some gummi bears. But there’s something else to consider. When we produce the data for design of experiments, how does the data get from the measuring device to Minitab?

If you’re lucky, you have an electronic thingamajig that takes measurements and beams them directly to a desktop computer where they’re stored in an analyzable format. But that’s Joan-Ginther-or-Jim-Frost-lucky. The rest of us are probably going to have to use a classic...

Optimizing Attribute Responses using Design of Experiments (DOE), Part 1

by Manikandan Jayakumar, guest blogger

We use Design of Experiments (DOE) to optimize the value of a response (Y) by simultaneously changing the values of several factors (X’s). The response will often be a continuous variable, but in some scenarios you need to optimize an attribute or categorical response (Pass/Fail, Accept/Reject, etc.). 

Collecting the Data for an Attribute Response DOE

Let’s see how we can use DOE to optimize an attribute response, using data from the manufacturing sector. We are looking at 3 factors with 2 levels each, which are coded as -1 and +1. The total number of...

Gummi Bear DOE: Choosing Factor Levels

Now that we've explored all of the DOE design choices in Minitab Statistical Software, it's time to think about the levels of the factors. I chose these 5 factors previously:

  • Position of catapult on the launch ramp
  • Angle of catapult
  • Number of rubber band windings
  • Position of gummi bear on the catapult
  • Position of fulcrum in the catapult

What Is an Effect in DOE?

In DOE, we're trying to detect the effect of changing each variable from the low to high level. Usually, the way to make the difference most obvious is to make the levels as far apart as possible. Let me borrow some data from Carly Barry to...

The Veepstakes: Using the Solution Desirability Matrix to Help Mitt Romney Choose the VP Candidate

The GOP vice-presidential sweepstakes, or veepstakes, is heating up. Rumors are swirling about whether Mitt Romney has picked a running mate and when he’ll announce it. Will he pick a more exciting but riskier candidate? Or, will he play it safe?

Have you ever wondered how a candidate like Romney might go about evaluating all of the possibilities? There are many potential VP candidates and many criteria. Wouldn’t it be nice if there were a tool to help sort this out? There is! And it comes from the world of Six Sigma and process improvement.

People associate Six Sigma with statistics, and it...