dcsimg
 

Regression Analysis

Blog posts and articles about regression analysis methods applied to Lean and Six Sigma projects.

As a Minitab trainer, one of the most common questions I get from training participants is "what should I do when my data isn’t normal?" A large number of statistical tests are based on the assumption of normality, so not having data that is normally distributed typically instills a lot of fear. Many practitioners suggest that if your data are not normal, you should do a nonparametric version of... Continue Reading
A few times a year, the Bureau of Labor Statistics (BLS) publishes a Spotlight on Statistics Article. The first such article of 2015 recently arrived, providing analysis of trends in long-term unemployment.  Certainly an interesting read on its own, but some of the included data gives us a good opportunity to look at how thought can improve your regression analysis. Fortunately, Minitab Statistical... Continue Reading
In this series of posts, I show how hypothesis tests and confidence intervals work by focusing on concepts and graphs rather than equations and numbers.   Previously, I used graphs to show what statistical significance really means. In this post, I’ll explain both confidence intervals and confidence levels, and how they’re closely related to P values and significance levels. How to Correctly... Continue Reading
Imagine that you are watching a race and that you are located close to the finish line. When the first and fastest runners complete the race, the differences in times between them will probably be quite small. Now wait until the last runners arrive and consider their finishing times. For these slowest runners, the differences in completion times will be extremely large. This is due to the fact that... Continue Reading
We’ve been pretty excited about March Madness here at Minitab. Kevin Rudy’s been busy creating his regression model and predicting the winners for the 2015 NCAA Men’s Basketball Tournament. But we’re not the only ones. Lots of folks are doing their best analysis to help you plan out your bracket now that the tip-offs for the round of 64 are just a day away. As you ponder your last-minute changes,... Continue Reading
The NCAA Tournament is right around the corner, and you know what that means: It’s time to start thinking about how you’re going to fill out your bracket! For the last two years I’ve used the Sagarin Predictor Ratings to predict the tournament. However, there is a problem with that strategy this year. The old method uses a regression model that calculates the probability one team has of beating... Continue Reading
In England, with only a few months left, the Barclay’s Premier League is about to enter the final run in to finish up the season. While the top two spots seem pretty locked up with Chelsea and Manchester City showing their class, the fight for the other two spots in the coveted top 4 promises to entertain to the very last weekend. This is key, because only the top 4 finishers qualify for next... Continue Reading
by Lion "Ari" Ondiappan Arivazhagan, guest blogger.  An alarming number of borewell accidents, especially involving little children, have occurred across India in the recent past. This is the second of a series of articles on Borewell accidents in India. In the first installment of the series, I used the G-chart in Minitab Statistical Software to predict the probabilities of innocent children... Continue Reading
In part 1 of this post, I covered how Six Sigma students at Rose-Hulman Institute of Technology cleaned up and prepared project data for a regression analysis. Now we're ready to start our analysis. We’ll detail the steps in that process and what we can learn from our results. What Factors Are Important? We collected data about 11 factors we believe could be significant: Whether the date of... Continue Reading
By Peter Olejnik, guest blogger. Previous posts on the Minitab Blog have discussed the work of the Six Sigma students at Rose-Hulman Institute of Technology to reduce the quantities of recyclables that wind up in the trash. Led by Dr. Diane Evans, these students continue to make an important impact on their community. As with any Six Sigma process, the results of the work need to be evaluated. A... Continue Reading
If you wanted to figure out the probability that your favorite football team will win their next game, how would you do it?  My colleague Eduardo Santiago and I recently looked at this question, and in this post we'll share how we approached the solution. Let’s start by breaking down this problem: There are only two possible outcomes: your favorite team wins, or they lose. Ties are a possibility,... Continue Reading
The Minitab Fan section of the Minitab blog is your chance to share with our readers! We always love to hear how you are using Minitab products for quality improvement projects, Lean Six Sigma initiatives, research and data analysis, and more. If our software has helped you, please share your Minitab story, too! My LSS coach suggested that I regularly conduct data analysis to refresh my Minitab... Continue Reading
Recently, Minitab’s Joel Smith posted about his vacation and being pooped on twice by birds. Then guest blogger Matthew Barsalou wrote a wonderful follow-up on the chances of Joel being pooped on a third time. While I cannot comment on how Joel has handled this situation psychologically so far, I can say that if I had been pooped on twice in a short amount of time, I would be wary of our... Continue Reading
As someone who has collected and analyzed real data for a living, the idea of using simulated data for a Monte Carlo simulation sounds a bit odd. How can you improve a real product with simulated data? In this post, I’ll help you understand the methods behind Monte Carlo simulation and walk you through a simulation example using Devize. What is Devize, you ask? Devize is Minitab's exciting new,... Continue Reading
In my recent meetings with people from various companies in the service industries, I realized that one of the problems they face is that they were collecting large amounts of "qualitative" data: types of product, customer profiles, different subsidiaries, several customer requirements, etc. As I discussed in my previous post, one way to look at qualitative data is to use different types of... Continue Reading
Choosing the correct linear regression model can be difficult. After all, the world and how it works is complex. Trying to model it with only a sample doesn’t make it any easier. In this post, I'll review some common statistical methods for selecting models, complications you may face, and provide some practical advice for choosing the best regression model. It starts when a researcher wants to... Continue Reading
Last fall I had a birthday. It wasn’t one of those tougher birthdays where the number ends in a zero. Still, the birthday got me thinking. In response, I told myself, age is just a number. Then I did a mental double-take. Can a statistician say that? After all, numbers are how I understand the world and the way it works. Can age just be a number? After some musing, I concluded that age is just a... Continue Reading
"Data! Data! Data! I can't make bricks without clay."  — Sherlock Holmes, in Arthur Conan Doyle's The Adventure of the Copper Beeches Whether you're the world's greatest detective trying to crack a case or a person trying to solve a problem at work, you're going to need information. Facts. Data, as Sherlock Holmes says.  But not all data is created equal, especially if you plan to analyze as part of... Continue Reading
Stepwise regression and best subsets regression are both automatic tools that help you identify useful predictors during the exploratory stages of model building for linear regression. These two procedures use different methods and present you with different output. An obvious question arises. Does one procedure pick the true model more often than the other? I’ll tackle that question in this post. Fi... Continue Reading
Using a sample to estimate the properties of an entire population is common practice in statistics. For example, the mean from a random sample estimates that parameter for an entire population. In linear regression analysis, we’re used to the idea that the regression coefficients are estimates of the true parameters. However, it’s easy to forget that R-squared (R2) is also an estimate.... Continue Reading