Tips and Techniques for Statistics and Quality Improvement

Blog posts and articles about using Minitab software in quality improvement projects, research, and more.

You run a capability analysis and your Cpk is bad. Now what? First, let’s first start by defining what “bad” is. In simple terms, the smaller the Cpk, the more defects you have. So the larger your Cpk is, the better. Many practitioners use a Cpk of 1.33 as the gold standard, so we’ll treat that as the gold standard here, too. Suppose we collect some data and run a capability analysis using Minitab St... Continue Reading
You know what the big thing is in the data analysis world—"Big Data." Big, big, big, very big data. Massive data. ENORMOUS data. Data that is just brain-bendingly big. Data so big that we need globally interconnected supercomputers that haven't even been built yet just to contain one one-billionth of it. That's the kind of big data everybody's so excited about.  Whatever. There's no denying that... Continue Reading
I recently guest lectured for an applied regression analysis course at Penn State. Now, before you begin making certain assumptions—because as any statistician will tell you, assumptions are important in regression—you should know that I have no teaching experience whatsoever, and I’m not much older than the students I addressed. I’m just 5 years removed from my undergraduate days at Virginia Tech,... Continue Reading
As we broke for lunch, two participants in the training class began to discuss, debate, and finally fight over a fundamental task in golf—how to drive the ball the farthest off the tee. Both were avid golfers and had spent a great deal of time and money on professional instruction and equipment, so the argument continued through the lunch hour, with neither arguer stopping to eat. Several other... Continue Reading
This summer, I created a model to determine the correct 4th down decision. But whether it’s for business or some personal interest, creating a model is just the starting point. The real benefits come from applying your model. And for the Big Ten 4th down calculator, the time to apply the model is now! On Saturday night, Penn State and Rutgers officially kicked off conference play for the 2015 Big... Continue Reading
Repeated measures designs don’t fit our impression of a typical experiment in several key ways. When we think of an experiment, we often think of a design that has a clear distinction between the treatment and control groups. Each subject is in one, and only one, of these non-overlapping groups. Subjects who are in a treatment group are exposed to only one type of treatment. This is the... Continue Reading
When I started out on the blog, I spent some time showing some data sets that would be easy to illustrate statistical concepts. It’s easier to show someone how something works with something familiar than with something they’ve never thought about before. Need a quick illustration to share with someone about how to summarize a variable in Minitab? See if they have a magazine on their desk, and... Continue Reading
Whatever industry you're in, you're going to need to buy supplies. If you're a printer, you'll need to purchase inks, various types of printing equipment, and paper. If you're in manufacturing, you'll need to obtain parts that you don't make yourself.  But how do you know you're making the right choice when you have multiple suppliers vying to fulfill your orders?  How can you be sure you're... Continue Reading
It sometimes may be prohibitively expensive or time-consuming to gather data for all runs for a designed experiment (DOE). For example, a 6 factor, 2-level factorial design can entail 64 experimental runs, which may be too high a number for your particular situation. We have seen how to handle these some of these situations in previous posts, such as  Design of Experiments: "Fractionating" and... Continue Reading
Variance is a measure of how much the data are scattered about their mean. Usually we want to minimize it as much as possible. A manufacturer of screws wants to minimize the variation in the length of the screws. A restaurant owner doesn't want the taste of the same meal to vary from one day to the next. And they might not know it, but most football coaches choose a low variance strategy when they... Continue Reading
If you use ordinary linear regression with a response of count data, if may work out fine (Part 1), or you may run into some problems (Part 2). Given that a count response could be problematic, why not use a regression procedure developed to handle a response of counts? A Poisson regression analysis is designed to analyze a regression model with a count response. First, let's try using Poisson... Continue Reading
My previous post showed an example of using ordinary linear regression to model a count response. For that particular count data, shown by the blue circles on the dot plot below, the model assumptions for linear regression were adequately satisfied. But frequently, count data may contain many values equal or close to 0. Also, the distribution of the counts may be right-skewed. In the quality field,... Continue Reading
Ever use dental floss to cut soft cheese? Or Alka Seltzer to clean your toilet bowl? You can find a host of nonconventional uses for ordinary objects online. Some are more peculiar than others. Ever use ordinary linear regression to evaluate a response (outcome) variable of counts?  Technically, ordinary linear regression was designed to evaluate a a continuous response variable. A continuous... Continue Reading
In 2007, the Crayola crayon company encountered a problem. Labels were coming off of their crayons. Up to that point, Crayola had done little to implement data-driven methodology into the process of manufacturing their crayons. But that was about to change. An elementary data analysis showed that the adhesive didn’t consistently set properly when the labels were dry. Misting crayons as they went... Continue Reading
In regression analysis, overfitting a model is a real problem. An overfit model can cause the regression coefficients, p-values, and R-squared to be misleading. In this post, I explain what an overfit model is and how to detect and avoid this problem. An overfit model is one that is too complicated for your data set. When this happens, the regression model becomes tailored to fit the quirks and... Continue Reading
Rare events inherently occur in all kinds of processes. In hospitals, there are medication errors, infections, patient falls, ventilator-associated pneumonias, and other rare, adverse events that cause prolonged hospital stays and increase healthcare costs.  But rare events happen in many other contexts, too. Software developers may need to track errors in lines of programming code, or a quality... Continue Reading
Newsweek's recent article, The Environmental Disaster in Your Closet, led me (through Greenpeace's Detox Catwalk) to an interesting new data set on the web. Since I like public data, I thought I'd share some graphs I made from the Chinese Institute for Public and Environmental (IPE) affairs global online platform. The IPE website describes that their goal is "to expand environmental information... Continue Reading
Imagine a multi-million dollar company that released a product without knowing the probability that it will fail after a certain amount of time. “We offer a 2 year warranty, but we have no idea what percentage of our products fail before 2 years.” Crazy, right? Anybody who wanted to ensure the quality of their product would perform a statistical analysis to look at the reliability and survival of... Continue Reading
To make objective decisions about the processes that are critical to your organization, you often need to examine categorical data. You may know how to use a t-test or ANOVA when you’re comparing measurement data (like weight, length, revenue, and so on), but do you know how to compare attribute or counts data? It easy to do with statistical software like Minitab.  One person may look at this bar... Continue Reading
There's more data available today than ever before, and with statistical software such as Minitab it only takes a couple of seconds to get some significant insights, whether it concerns how to make your business run better or national politics.  For instance, if we look back at the last 9 presidential elections (1980 to 2012), there are some interesting correlations between the percent of state... Continue Reading