dcsimg
 

Data Analysis

Blog posts and articles with tips for analyzing data for quality improvement methodologies, including Six Sigma and Lean.

In my previous post, I showed you that the coefficients are different when choosing (-1,0,1) vs (1,0) coding schemes for General Linear Model (or Regression).  We used the two different equations to calculate the same fitted values. Here I will focus on showing what the different coefficients represent.  Let's use the data and models from the last blog post: We can display the means for each level... Continue Reading
Since Minitab 17 Statistical Software launched in February 2014, we've gotten great feedback from many people have been using the General Linear Model and Regression tools. But in speaking with people as part of Minitab's Technical Support team, I've found many are noticing that there are two coding schemes available with each. We frequently get calls from people asking how the coding scheme you... Continue Reading
Earlier, I wrote about the different types of data statisticians typically encounter. In this post, we're going to look at why, when given a choice in the matter, we prefer to analyze continuous data rather than categorical/attribute or discrete data.  As a reminder, when we assign something to a group or give it a name, we have created attribute or categorical data.  If we count something, like... Continue Reading
Over the past few years, the average length of an MLB game has been steadily increasing. We can create a quick time series plot in Minitab Statistical Software to display this: As games have been lasting longer, there's been a feeling shared by many that this was a negative. Games seemed to drag on, with a lot of unnecessary stoppages and breaks. To combat this trend, and to try to speed up games to... Continue Reading
In my previous post, I wrote about the hypothesis testing ban in the Journal of Basic and Applied Social Psychology. I showed how P values and confidence intervals provide important information that descriptive statistics alone don’t provide. In this post, I'll cover the editors’ concerns about hypothesis testing and how to avoid the problems they describe. The editors describe hypothesis testing... Continue Reading
If you’ve checked out What’s New in Minitab 17, you’ve had the chance to see that Conditional Formatting leads the list. If you’ve been reading the Minitab blog, you’ve had the chance to see demonstrations with Marvel’s Avengers and the Human Development Index. But you might not have had a chance to see that you can highlight large standardized residuals from a regression model and that the... Continue Reading
In previous posts, I discussed the results of a recycling project done by Six Sigma students at Rose-Hulman Institute of Technology last spring. (If you’re playing catch up, you can read Part I and Part II.) The students did an awesome job reducing the amount of recycling that was thrown into the normal trash cans across all of the institution’s academic buildings. At the end of the spring... Continue Reading
Before cutting an expensive piece of granite for a countertop, a good carpenter will first confirm he has measured correctly. Acting on faulty measurements could be costly. While no measurement system is perfect, we rely on such systems to quantify data that help us control quality and monitor changes in critical processes. So, how do you know whether the changes you see are valid and not just the... Continue Reading
Banned! In February 2015, editor David Trafimow and associate editor Michael Marks of the Journal of Basic and Applied Social Psychology declared that the null hypothesis statistical testing procedure is invalid. They promptly banned P values, confidence intervals, and hypothesis testing from the journal. The journal now requires descriptive statistics and effect sizes. They also encourage large... Continue Reading
As a Minitab trainer, one of the most common questions I get from training participants is "what should I do when my data isn’t normal?" A large number of statistical tests are based on the assumption of normality, so not having data that is normally distributed typically instills a lot of fear. Many practitioners suggest that if your data are not normal, you should do a nonparametric version of... Continue Reading
Many of the things you need to monitor can be measured in a concrete, objective way, such as an item's weight or length. But, many important characteristics are more subjective, such as the collaborative culture of the workplace, or an individual's political outlook. A survey is an excellent way to measure these kinds of characteristics. To better understand a characteristic, a researcher asks... Continue Reading
The 2016 presidential race is becoming more real. We’ve had several announcements with Ted Cruz, Rand Paul, Hillary Clinton, and Marco Rubio officially entering the race to be President. While the prospective Democratic candidates are down to one, or at most a few, the Republican field is extra-large this election cycle. The first order of business for a GOP candidate is to survive the nomination... Continue Reading
In 1898, Russian economist Ladislaus Bortkiewicz published his first statistics book entitled Das Gesetz der keinem Zahlen, in which he included an example that eventually became famous for illustrating the Poisson distribution. Bortkiewicz researched the annual deaths by horse kicks in the Prussian Army from 1875-1984. Data was recorded from 14 different army corps, with one being the Guard... Continue Reading
The Cp and Cpk are well known capability indices commonly used to ensure that a process spread is as small as possible compared to the tolerance interval (Cp), or that it stays well within specifications (Cpk). Yet another type of capability index exists: the Cpm, which is much less known and used less frequently. The main difference between the Cpm and the other capability indices is that the... Continue Reading
The two previous posts in this series focused on manipulating data using Minitab’s calculator and the Data menu. In this third and final post, we continue to explore helpful features for working with text data and will focus on some new features in Minitab 17.2’s Editor menu. Using the Editor Menu  The Editor menu is unique in that the options displayed depend on what is currently active... Continue Reading
My previous post focused on manipulating text data using Minitab’s calculator. In this post we continue to explore some of the useful tools for working with text data, and here we’ll focus on Minitab 17.2’s Data menu. This is the second in a 3-part series, and in the final post we’ll look at the new features in Minitab 17.2’s Editor menu. Using the Data Menu When I think of the Data menu, I think... Continue Reading
With Minitab, it’s easy to create graphs and manage numeric, date/time and text data.  Now Minitab 17.2’s enhanced data manipulation features make it even easier to work with text data. This is the first of three posts in which I'm going to focus on various tools in Minitab that are useful when working with text data, including the Calculator, the Data menu, and the Editor menu. Using the Calculator Y... Continue Reading
In this series of posts, I show how hypothesis tests and confidence intervals work by focusing on concepts and graphs rather than equations and numbers.   Previously, I used graphs to show what statistical significance really means. In this post, I’ll explain both confidence intervals and confidence levels, and how they’re closely related to P values and significance levels. How to Correctly... Continue Reading
To choose the right statistical analysis, you need to know the distribution of your data. Suppose you want to assess the capability of your process. If you conduct an analysis that assumes the data follow a normal distribution when, in fact, the data are nonnormal, your results will be inaccurate. To avoid this costly error, you must determine the distribution of your data. So, how do you determine... Continue Reading
Imagine that you are watching a race and that you are located close to the finish line. When the first and fastest runners complete the race, the differences in times between them will probably be quite small. Now wait until the last runners arrive and consider their finishing times. For these slowest runners, the differences in completion times will be extremely large. This is due to the fact that... Continue Reading