dcsimg
 

Learning

Stories and real-world examples that show you how to apply statistics and statistical software to solve problems.

Histograms are one of the most common graphs used to display numeric data. Anyone who takes a statistics course is likely to learn about the histogram, and for good reason: histograms are easy to understand and can instantly tell you a lot about your data. Here are three of the most important things you can learn by looking at a histogram.  Shape—Mirror, Mirror, On the Wall… If the left side of a... Continue Reading
Did you ever wonder why statistical analyses and concepts often have such weird, cryptic names? One conspiracy theory points to the workings of a secret committee called the ICSSNN. The International Committee for Sadistic Statistical Nomenclature and Numerophobia was formed solely to befuddle and subjugate the masses. Its mission: To select the most awkward, obscure, and confusing name possible... Continue Reading

7 Deadly Statistical Sins Even the Experts Make

Do you know how to avoid them?

Get the facts >
T'was the season for toys recently, and Christmas day found me playing around with a classic, the Etch-a-Sketch. As I noodled with the knobs, I had a sudden flash of recognition: my drawing reminded me of the Empirical CDF Plot in Minitab Statistical Software. Did you just ask, "What's a CDF plot? And what's so empirical about it?" Both very good questions. Let's start with the first, and we'll... Continue Reading
Data mining can be helpful in the exploratory phase of an analysis. If you're in the early stages and you're just figuring out which predictors are potentially correlated with your response variable, data mining can help you identify candidates. However, there are problems associated with using data mining to select variables. In my previous post, we used data mining to settle on the following... Continue Reading
True or false: When comparing a parameter for two sets of measurements, you should always use a hypothesis test to determine whether the difference is statistically significant. The answer? (drumroll...) True! ...and False! To understand this paradoxical answer, you need to keep in mind the difference between samples, populations, and descriptive and inferential statistics.  Descriptive Statistics and... Continue Reading
Today, September 16, is World Ozone Day. You don't hear much about the ozone layer any more. In fact, if you’re under 30, you might think this is just another trivial, obscure observance, along the lines of International Dot Day (yesterday) or National Apple Dumpling Day (tomorrow). But there’s a good reason that, almost 30 years ago, the United Nations designated today to as a day to raise... Continue Reading
You’ve performed multiple linear regression and have settled on a model which contains several predictor variables that are statistically significant. At this point, it’s common to ask, “Which variable is most important?” This question is more complicated than it first appears. For one thing, how you define “most important” often depends on your subject area and goals. For another, how you collect... Continue Reading
In regression, "sums of squares" are used to represent variation. In this post, we’ll use some sample data to walk through these calculations. The sample data used in this post is available within Minitab by choosing Help > Sample Data, or File > Open Worksheet > Look in Minitab Sample Data folder (depending on your version of Minitab).  The dataset is called ResearcherSalary.MTW, and contains data... Continue Reading
I blogged a few months back about three different Minitab tools you can use to examine your data over time. Did you know you that you can also use a simple run chart to display how your process data changes over time? Of course those “changes” could be evidence of special-cause variation, which a run chart can help you see. What’s special-cause variation, and how’s it different from common-cause... Continue Reading
While some posts in our Minitab blog focus on understanding t-tests and t-distributions this post will focus more simply on how to hand-calculate the t-value for a one-sample t-test (and how to replicate the p-value that Minitab gives us).  The formulas used in this post are available within Minitab Statistical Software by choosing the following menu path: Help > Methods and Formulas > Basic... Continue Reading
An outlier is an observation in a data set that lies a substantial distance from other observations. These unusual observations can have a disproportionate effect on statistical analysis, such as the mean, which can lead to misleading results. Outliers can provide useful information about your data or process, so it's important to investigate them. Of course, you have to find them first.  Finding... Continue Reading
Businesses are getting more and more data from existing and potential customers: whenever we click on a web site, for example, it can be recorded in the vendor's database. And whenever we use electronic ID cards to access public transportation or other services, our movements across the city may be analyzed. In the very near future, connected objects such as cars and electrical appliances will... Continue Reading
For hundreds of years, people having been improving their situation by pulling themselves up by their bootstraps. Well, now you can improve your statistical knowledge by pulling yourself up by your bootstraps. Minitab Express has 7 different bootstrapping analyses that can help you better understand the sampling distribution of your data.  A sampling distribution describes the likelihood of... Continue Reading
Analysis of variance (ANOVA) can determine whether the means of three or more groups are different. ANOVA uses F-tests to statistically test the equality of means. In this post, I’ll show you how ANOVA and F-tests work using a one-way ANOVA example. But wait a minute...have you ever stopped to wonder why you’d use an analysis of variance to determine whether means are different? I'll also show how... Continue Reading
In statistics, t-tests are a type of hypothesis test that allows you to compare means. They are called t-tests because each t-test boils your sample data down to one number, the t-value. If you understand how t-tests calculate t-values, you’re well on your way to understanding how these tests work. In this series of posts, I'm focusing on concepts rather than equations to show how t-tests work.... Continue Reading
T-tests are handy hypothesis tests in statistics when you want to compare means. You can compare a sample mean to a hypothesized or target value using a one-sample t-test. You can compare the means of two groups with a two-sample t-test. If you have two groups with paired observations (e.g., before and after measurements), use the paired t-test. How do t-tests work? How do t-values fit in? In this... Continue Reading
About a year ago, a reader asked if I could try to explain degrees of freedom in statistics. Since then,  I’ve been circling around that request very cautiously, like it’s some kind of wild beast that I’m not sure I can safely wrestle to the ground. Degrees of freedom aren’t easy to explain. They come up in many different contexts in statistics—some advanced and complicated. In mathematics, they're... Continue Reading
Allow me to make a confession up front: I won't hesitate to beat my kids at a game. My kids are young enough that in pretty much any game that is predominantly determined by skill and not luck, I can beat them—and beat them easily. This isn't some macho thing where it makes me feel good, and I suppose is only partially based in wanting them to handle both winning and losing well. It's just how I... Continue Reading
P values have been around for nearly a century and they’ve been the subject of criticism since their origins. In recent years, the debate over P values has risen to a fever pitch. In particular, there are serious fears that P values are misused to such an extent that it has actually damaged science. In March 2016, spurred on by the growing concerns, the American Statistical Association (ASA) did... Continue Reading
There's nothing like a boxplot, aka box-and-whisker diagram, to get a quick snapshot of the distribution of your data. With a single glance, you can readily intuit its general shape, central tendency, and variability. To easily compare the distribution of data between groups, display boxplots for the groups side by side. Visually compare the central value and spread of the distribution for each... Continue Reading