Statistics Help

Blog posts and articles that offer tips about the statistics used in lean and six sigma quality improvement projects.

Design of Experiments is an extremely powerful statistical method, and we added a DOE tool to the Assistant in Minitab to make it more accessible to more people. Since it's summer grilling season, I'm applying the Assistant's DOE tool to outdoor cooking. Earlier, I showed you how to set up a designed experiment that will let you optimize how you grill steaks.  If you're not already using it and you... Continue Reading
Earlier this month, PLOS.org published an article titled "Ten Simple Rules for Effective Statistical Practice." The 10 rules are good reading for anyone who draws conclusions and makes decisions based on data, whether you're trying to extend the boundaries of scientific knowledge or make good decisions for your business.  Carnegie Mellon University's Robert E. Kass and several co-authors devised... Continue Reading

7 Deadly Statistical Sins Even the Experts Make

Do you know how to avoid them?

Get the facts >
You often hear the data being blamed when an analysis is not delivering the answers you wanted or expected. I was recently reminded that the data chosen or collected for a specific analysis is determined by the analyst, so there is no such thing as bad data—only bad analysis.  This made me think about the steps an analyst can take to minimise the risk of producing analysis that fails to answer... Continue Reading
An outlier is an observation in a data set that lies a substantial distance from other observations. These unusual observations can have a disproportionate effect on statistical analysis, such as the mean, which can lead to misleading results. Outliers can provide useful information about your data or process, so it's important to investigate them. Of course, you have to find them first.  Finding... Continue Reading
Technology is very much part of our lives nowadays. We use our smartphones to have video calls with our friends and family, and watch our favourite TV shows on tablets. Technology has also transformed the fitness industry with the increasing popularity of fitness trackers. Recently, I got myself a fitness watch and it's becoming my favourite gadget. It can track how many steps I’ve taken, my... Continue Reading
Businesses are getting more and more data from existing and potential customers: whenever we click on a web site, for example, it can be recorded in the vendor's database. And whenever we use electronic ID cards to access public transportation or other services, our movements across the city may be analyzed. In the very near future, connected objects such as cars and electrical appliances will... Continue Reading
The last thing you want to do when you purchase a new piece of software is spend an excessive amount of time getting up and running. You’ve probably been ready to the use the software since, well, yesterday. Minitab has always focused on making our software easy to use, but many professional software packages do have a steep learning curve. Whatever package you’re using, here are three things you... Continue Reading
Suppose you’ve collected data on cycle time, revenue, the dimension of a manufactured part, or some other metric that’s important to you, and you want to see what other variables may be related to it. Now what? When I graduated from college with my first statistics degree, my diploma was bona fide proof that I'd endured hours and hours of classroom lectures on various statistical topics, including l... Continue Reading
Do you recall my “putting the cart before the horse” analogy in part 1 of this blog series? The comparison is simple. We all, at times, put the cart before the horse in relatively innocuous ways, such as eating your dessert before you’ve eaten your dinner, or deciding what to wear before you’ve been invited to the party. But performing some tasks in the wrong order, such as running a statistical... Continue Reading
Analysis of variance (ANOVA) can determine whether the means of three or more groups are different. ANOVA uses F-tests to statistically test the equality of means. In this post, I’ll show you how ANOVA and F-tests work using a one-way ANOVA example. But wait a minute...have you ever stopped to wonder why you’d use an analysis of variance to determine whether means are different? I'll also show how... Continue Reading
In statistics, t-tests are a type of hypothesis test that allows you to compare means. They are called t-tests because each t-test boils your sample data down to one number, the t-value. If you understand how t-tests calculate t-values, you’re well on your way to understanding how these tests work. In this series of posts, I'm focusing on concepts rather than equations to show how t-tests work.... Continue Reading
Did you know about the Minitab Network group on LinkedIn? It’s the one managed by Eston Martz, who also edits the Minitab blog. I like to see what the members are talking about, which recently got me into some discussions about Raman spectroscopy data. Not having much experience with Raman spectroscopy data, I thought I’d learn more about it and found the RRUFFTM Project. The idea is that if you... Continue Reading
T-tests are handy hypothesis tests in statistics when you want to compare means. You can compare a sample mean to a hypothesized or target value using a one-sample t-test. You can compare the means of two groups with a two-sample t-test. If you have two groups with paired observations (e.g., before and after measurements), use the paired t-test. How do t-tests work? How do t-values fit in? In this... Continue Reading
Five-point Likert scales are commonly associated with surveys and are used in a wide variety of settings. You’ve run into the Likert scale if you’ve ever been asked whether you strongly agree, agree, neither agree or disagree, disagree, or strongly disagree about something. The worksheet to the right shows what five-point Likert data look like when you have two groups. Because Likert item data are... Continue Reading
You have a column of categorical data. Maybe it’s a column of reasons for production downtime, or customer survey responses, or all of the reasons airlines give for those riling flight delays. Whatever type of qualitative data you may have, suppose you want to find the most common categories. Here are three different ways to do that: 1. Pareto Charts Pareto Charts easily help you separate the vital... Continue Reading
I’ve written about R-squared before and I’ve concluded that it’s not as intuitive as it seems at first glance. It can be a misleading statistic because a high R-squared is not always good and a low R-squared is not always bad. I’ve even said that R-squared is overrated and that the standard error of the estimate (S) can be more useful. Even though I haven’t always been enthusiastic about... Continue Reading
In statistics, there are things you need to do so you can trust your results. For example, you should check the sample size, the assumptions of the analysis, and so on. In regression analysis, I always urge people to check their residual plots. In this blog post, I present one more thing you should do so you can trust your regression results in certain circumstances—standardize the continuous... Continue Reading
If you need to assess process performance relative to some specification limit(s), then process capability is the tool to use. You collect some accurate data from a stable process, enter those measurements in Minitab, and then choose Stat > Quality Tools > Capability Analysis/Sixpack or Assistant > Capability Analysis. Now, what about sorting the data? I’ve been asked “why does Cpk change when I... Continue Reading
In the world of linear models, a hierarchical model contains all lower-order terms that comprise the higher-order terms that also appear in the model. For example, a model that includes the interaction term A*B*C is hierarchical if it includes these terms: A, B, C, A*B, A*C, and B*C. Fitting the correct regression model can be as much of an art as it is a science. Consequently, there's not always a... Continue Reading
If you perform linear regression analysis, you might need to compare different regression lines to see if their constants and slope coefficients are different. Imagine there is an established relationship between X and Y. Now, suppose you want to determine whether that relationship has changed. Perhaps there is a new context, process, or some other qualitative change, and you want to determine... Continue Reading