# Statistics Help

Blog posts and articles that offer tips about the statistics used in lean and six sigma quality improvement projects.

In its industry guidance to companies that manufacture drugs and biological products for people and animals, the Food and Drug Administration (FDA) recommends three stages for process validation. While my last post covered statistical tools for the Process Design stage, here we will focus on the statistical techniques typically utilized for the second stage, Process Qualification. Stage 2: Process... Continue Reading
T'was the season for toys recently, and Christmas day found me playing around with a classic, the Etch-a-Sketch. As I noodled with the knobs, I had a sudden flash of recognition: my drawing reminded me of the Empirical CDF Plot in Minitab Statistical Software. Did you just ask, "What's a CDF plot? And what's so empirical about it?" Both very good questions. Let's start with the first, and we'll... Continue Reading

### 7 Deadly Statistical Sins Even the Experts Make

Do you know how to avoid them?

The language of statistics is a funny thing, but there usually isn't much to laugh at in the consequences that can follow when misunderstandings occur between statisticians and non-statisticians. We see these consequences frequently in the media, when new studies—that usually contradict previous ones—are breathlessly related, as if their findings were incontrovertible facts. Similar, though less... Continue Reading
The line plot is an incredibly agile but frequently overlooked tool in the quest to better understand your processes. In any process, whether it's baking a cake or processing loan forms, many factors have the potential to affect the outcome. Changing the source of raw materials could affect the strength of plywood a factory produces. Similarly, one method of gluing this plywood might be better... Continue Reading
If you’re familiar with Lean Six Sigma, then you’re familiar with DMAIC. DMAIC is the acronym for Define, Measure, Analyze, Improve and Control. This proven problem-solving strategy provides a structured 5-phase framework to follow when working on an improvement project. This is the first post in a five-part series that focuses on the tools available in Minitab Statistical Software that are most... Continue Reading
Dear Readers, As 2016 comes to a close, it’s time to reflect on the passage of time and changes. As I’m sure you’ve guessed, I love statistics and analyzing data! I also love talking and writing about it. In fact, I’ve been writing statistical blog posts for over five years, and it’s been an absolute blast. John Tukey, the renowned statistician, once said, “The best thing about being a statistician... Continue Reading
This week we’re celebrating the annual Thanksgiving holiday in the United States, which is not only a good time to reflect on the things we’re grateful for, but it’s also a good time to stuff yourself with turkey, mashed potatoes, green bean casserole, and the usual suspects that find their way to the Thanksgiving table! While I’m of course very thankful for my family, friends, home, etc., I’m also... Continue Reading
In this day and age, it’s not uncommon that data entry errors occur in data sets that are so large that looking for and correcting the errors by hand is impractical. Fortunately, Minitab includes tools that make it easy to get your data into shape, so that you can proceed to getting the answers you need. Let’s say, for example, that you were going to look at the Global Wood Density Database. It’s... Continue Reading
At the inaugural Minitab Insights Conference in September, presenters Benjamin Turcan and Jennifer Berner discussed how to present data effectively. Among the considerations they discussed was choosing the right graph. Different graphs are good for different things. Of course, opinions about which graph is best can, and do, differ. Dotplot devotees might decide that they are demonstrably... Continue Reading
In Part 1 of this blog series, I wrote about how statistical inference uses data from a sample of individuals to reach conclusions about the whole population. That’s a very powerful tool, but you must check your assumptions when you make statistical inferences. Violating any of these assumptions can result in false positives or false negatives, thus invalidating your results.  The common data... Continue Reading
Statistical inference uses data from a sample of individuals to reach conclusions about the whole population. It’s a very powerful tool. But as the saying goes, “With great power comes great responsibility!” When attempting to make inferences from sample data, you must check your assumptions. Violating any of these assumptions can result in false positives or false negatives, thus invalidating... Continue Reading
Data mining can be helpful in the exploratory phase of an analysis. If you're in the early stages and you're just figuring out which predictors are potentially correlated with your response variable, data mining can help you identify candidates. However, there are problems associated with using data mining to select variables. In my previous post, we used data mining to settle on the following... Continue Reading
On the Minitab Blog, we’ve often discussed getting data into Minitab from Excel. Here's a small sampling, in case you currently have data in Excel: Minitab and Excel: Making the (Data) Connection Linking Minitab to Excel to Get Answers Fast 3 Tips for Importing Excel Data into Minitab But if your data is not in Excel to begin with, taking it into Excel to prepare it for entry into Minitab isn’t... Continue Reading
The ultimate goal of most quality improvement projects is clear: reducing the number of defects, improving a response, or making a change that benefits your customers. We often want to jump right in and start gathering and analyzing data so we can solve the problems. Checking your measurement systems first, with methods like attribute agreement analysis or Gage R&R, may seem like a needless waste... Continue Reading
I watched an old motorcycle flick from the 1960s the other night, and I was struck by the bikers' slang. They had a language all their own. Just like statisticians, whose manner of speaking often confounds those who aren't hep to the lingo of data analysis. It got me thinking...what if there were an all-statistician biker gang? Call them the Nulls Angels. Imagine them in their colors, tearing... Continue Reading
Data mining uses algorithms to explore correlations in data sets. An automated procedure sorts through large numbers of variables and includes them in the model based on statistical significance alone. No thought is given to whether the variables and the signs and magnitudes of their coefficients make theoretical sense. We tend to think of data mining in the context of big data, with its huge... Continue Reading
You’ve performed multiple linear regression and have settled on a model which contains several predictor variables that are statistically significant. At this point, it’s common to ask, “Which variable is most important?” This question is more complicated than it first appears. For one thing, how you define “most important” often depends on your subject area and goals. For another, how you collect... Continue Reading
If you’re in the market for statistical software, there are many considerations and more than a few options for you to evaluate. Check out these seven questions to ask yourself before choosing statistical software—your answers should help guide you towards the best solution for your needs! 1. Who uses statistical software in your organization? Are they expert statisticians, novices, or a mix of both?... Continue Reading
In regression, "sums of squares" are used to represent variation. In this post, we’ll use some sample data to walk through these calculations. The sample data used in this post is available within Minitab by choosing Help > Sample Data, or File > Open Worksheet > Look in Minitab Sample Data folder (depending on your version of Minitab).  The dataset is called ResearcherSalary.MTW, and contains data... Continue Reading
So the data you nurtured, that you worked so hard to format and make useful, failed the normality test. Time to face the truth: despite your best efforts, that data set is never going to measure up to the assumption you may have been trained to fervently look for. Your data's lack of normality seems to make it poorly suited for analysis. Now what? Take it easy. Don't get uptight. Just let your data... Continue Reading