dcsimg
 

Statistics Help

Blog posts and articles that offer tips about the statistics used in lean and six sigma quality improvement projects.

When you work in data analysis, you quickly discover an irrefutable fact: a lot of people just can't stand statistics. Some people fear the math, some fear what the data might reveal, some people find it deadly dull, and others think it's bunk. Many don't even really know why they hate statistics—they just do. Always have, probably always will.  Problem is, that means we who analyze data need to com... Continue Reading
Back when I was an undergrad in statistics, I unfortunately spent an entire semester of my life taking a class, diligently crunching numbers with my TI-82, before realizing 1) that I was actually in an Analysis of Variance (ANOVA) class, 2) why I would want to use such a tool in the first place, and 3) that ANOVA doesn’t necessarily tell you a thing about variances. Fortunately, I've had a lot more... Continue Reading

7 Deadly Statistical Sins Even the Experts Make

Do you know how to avoid them?

Get the facts >
Control charts are a fantastic tool. These charts plot your process data to identify common cause and special cause variation. By identifying the different causes of variation, you can take action on your process without over-controlling it. Assessing the stability of a process can help you determine whether there is a problem and identify the source of the problem. Is the mean too high, too low,... Continue Reading
As Halloween approaches, you are probably taking the necessary steps to protect yourself from the various ghosts, goblins, and witches that are prowling around. Monsters of all sorts are out to get you, unless they’re sufficiently bribed with candy offerings! I’m here to warn you about a ghoul that all statisticians and data scientists need to be aware of: phantom degrees of freedom. These phantoms... Continue Reading
In Part 3 of our series, we decided to test our 4 experimental factors, Club Face Tilt, Ball Characteristics, Club Shaft Flexibility, and Tee Height in a full factorial design because of the many advantages of that data collection plan. In Part 4 we concluded that each golfer should replicate their half fraction of the full factorial 5 times in order to have a high enough power to detect... Continue Reading
I read trade publications that cover everything from banking to biotech, looking for interesting perspectives on data analysis and statistics, especially where it pertains to quality improvement. Recently I read a great blog post from Tony Taylor, an analytical chemist with a background in pharmaceuticals. In it, he discusses the implications of the FDA's updated guidance for industry analytical... Continue Reading
An exciting new study sheds light on the relationship between P values and the replication of experimental results. This study highlights issues that I've emphasized repeatedly—it is crucial to interpret P values correctly, and significant results must be replicated to be trustworthy. The study also supports my disagreement with the decision by the Journal of Basic and Applied Social Psychology to b... Continue Reading
Repeated measures designs don’t fit our impression of a typical experiment in several key ways. When we think of an experiment, we often think of a design that has a clear distinction between the treatment and control groups. Each subject is in one, and only one, of these non-overlapping groups. Subjects who are in a treatment group are exposed to only one type of treatment. This is the... Continue Reading
When I started out on the blog, I spent some time showing some data sets that would be easy to illustrate statistical concepts. It’s easier to show someone how something works with something familiar than with something they’ve never thought about before. Need a quick illustration to share with someone about how to summarize a variable in Minitab? See if they have a magazine on their desk, and... Continue Reading
Whatever industry you're in, you're going to need to buy supplies. If you're a printer, you'll need to purchase inks, various types of printing equipment, and paper. If you're in manufacturing, you'll need to obtain parts that you don't make yourself.  But how do you know you're making the right choice when you have multiple suppliers vying to fulfill your orders?  How can you be sure you're... Continue Reading
In regression analysis, overfitting a model is a real problem. An overfit model can cause the regression coefficients, p-values, and R-squared to be misleading. In this post, I explain what an overfit model is and how to detect and avoid this problem. An overfit model is one that is too complicated for your data set. When this happens, the regression model becomes tailored to fit the quirks and... Continue Reading
Statisticians say the darndest things. At least, that's how it can seem if you're not well-versed in statistics.  When I began studying statistics, I approached it as a language. I quickly noticed that compared to other disciplines, statistics has some unique problems with terminology, problems that don't affect most scientific and academic specialties.  For example, dairy science has a highly... Continue Reading
If you've read the first two parts of this tale, you know it started when I published a post that involved transforming data for capability analysis. When an astute reader asked why Minitab didn't seem to transform the data outside of the capability analysis, it revealed an oversight that invalidated the original analysis.  I removed the errant post. But to my surprise, the reader who helped me... Continue Reading
Last time, I told you how I had double-checked the analysis in a post that involved running the Johnson transformation on a set of data before doing normal capability analysis on it. A reader asked why the transformation didn't work on the data when you applied it outside of the capability analysis.  I hadn't tried transforming the data that way, but if the transformation worked when performed as... Continue Reading
Every now and then I’ll test my Internet speed at home using such sites as http://speedtest.comcast.net.  My need to perform these tests could stem from the cool-looking interfaces they employ on their site, as they display the results using analog speedometers and RPM meters. They could also stem from the validation that I need in "getting what I am paying for," although I realize that there are... Continue Reading
Last month the ESPN series Outside the Lines reported on major league pitchers suffering serious injuries from being struck in the head by line drives, and efforts MLB is making towards having protective gear developed for pitchers. You can view the report here if you'd like: A couple of things jump out at me from the clip: The overwhelming majority of pitchers are not interested in wearing... Continue Reading
Previously, I’ve written about how to interpret regression coefficients and their individual P values. I’ve also written about how to interpret R-squared to assess the strength of the relationship between your model and the response variable. Recently I've been asked, how does the F-test of the overall significance and its P value fit in with these other statistics? That’s the topic of this post! In... Continue Reading
In my previous post, I wrote about the hypothesis testing ban in the Journal of Basic and Applied Social Psychology. I showed how P values and confidence intervals provide important information that descriptive statistics alone don’t provide. In this post, I'll cover the editors’ concerns about hypothesis testing and how to avoid the problems they describe. The editors describe hypothesis testing... Continue Reading
Banned! In February 2015, editor David Trafimow and associate editor Michael Marks of the Journal of Basic and Applied Social Psychology declared that the null hypothesis statistical testing procedure is invalid. They promptly banned P values, confidence intervals, and hypothesis testing from the journal. The journal now requires descriptive statistics and effect sizes. They also encourage large... Continue Reading
As a Minitab trainer, one of the most common questions I get from training participants is "what should I do when my data isn’t normal?" A large number of statistical tests are based on the assumption of normality, so not having data that is normally distributed typically instills a lot of fear. Many practitioners suggest that if your data are not normal, you should do a nonparametric version of... Continue Reading