Data Analysis Software

Blog posts and articles with tips for using statistical software to analyze data for quality improvement.

In regression analysis, overfitting a model is a real problem. An overfit model can cause the regression coefficients, p-values, and R-squared to be misleading. In this post, I explain what an overfit model is and how to detect and avoid this problem. An overfit model is one that is too complicated for your data set. When this happens, the regression model becomes tailored to fit the quirks and... Continue Reading
To make objective decisions about the processes that are critical to your organization, you often need to examine categorical data. You may know how to use a t-test or ANOVA when you’re comparing measurement data (like weight, length, revenue, and so on), but do you know how to compare attribute or counts data? It easy to do with statistical software like Minitab.  One person may look at this bar... Continue Reading
When we take pictures with a digital camera or smartphone, what the device really does is capture information in the form of binary code. At the most basic level, our precious photos are really just a bunch of 1s and 0s, but if we were to look at them that way, they'd be pretty unexciting. In its raw state, all that information the camera records is worthless. The 1s and 0s need to be converted... Continue Reading
If you want to use data to predict the impact of different variables, whether it's for business or some personal interest, you need to create a model based on the best information you have at your disposal. In this post and subsequent posts throughout the football season, I'm going to share how I've been developing and applying a model for predicting the outcomes of 4th down decisions in Big... Continue Reading
by Colin Courchesne, guest blogger, representing his Governor's School research team.   High-level research opportunities for high school students are rare; however, that was just what the New Jersey Governor’s School of Engineering and Technology provided.  Bringing together the best and brightest rising seniors from across the state, the Governor’s School, or GSET for short, tasks teams of... Continue Reading
Statisticians say the darndest things. At least, that's how it can seem if you're not well-versed in statistics.  When I began studying statistics, I approached it as a language. I quickly noticed that compared to other disciplines, statistics has some unique problems with terminology, problems that don't affect most scientific and academic specialties.  For example, dairy science has a highly... Continue Reading
Last time, I told you how I had double-checked the analysis in a post that involved running the Johnson transformation on a set of data before doing normal capability analysis on it. A reader asked why the transformation didn't work on the data when you applied it outside of the capability analysis.  I hadn't tried transforming the data that way, but if the transformation worked when performed as... Continue Reading
I don't like the taste of crow. That's a shame, because I'm about to eat a huge helping of it.  I'm going to tell you how I messed up an analysis. But in the process, I learned some new lessons and was reminded of some older ones I should remember to apply more carefully.  This Failure Starts in a Victory My mistake originated in the 2015 Triple Crown victory of American Pharoah. I'm no... Continue Reading
Every now and then I’ll test my Internet speed at home using such sites as http://speedtest.comcast.net  or http://www.att.com/speedtest/.  My need to perform these tests could stem from the cool-looking interfaces they employ on their site, as they display the results using analog speedometers and RPM meters. They could also stem from the validation that I need in "getting what I am paying for,"... Continue Reading
By Matthew Barsalou, guest blogger.   Many statistical tests assume the data being tested came from a normal distribution. Violating the assumption of normality can result in incorrect conclusions. For example, a Z test may indicate a new process is more efficient than an older process when this is not true. This could result in a capital investment for equipment that actually results in higher... Continue Reading
Design of Experiments is an extremely powerful statistical method, we added a DOE tool to the Assistant in Minitab 17  to make it more accessible to more people. Since it's summer here, I'm applying the Assistant's DOE tool to outdoor cooking. Earlier, I showed you how to set up a designed experiment that will let you optimize how you grill steaks.  If you're not already using it and you want to... Continue Reading
Design of Experiments (DOE) has a reputation for difficulty, and to an extent, this statistical method deserves that reputation. While it's easy to grasp the basic idea—acquire the maximum amount of information from the fewest number of experimental runs—practical application of this tool can quickly become very confusing.  Even if you're a long-time user of designed experiments, it's still easy to... Continue Reading
Before I joined Minitab, I worked for many years in Penn State's College of Agricultural Sciences as a writer and editor. I frequently wrote about food science and particularly food safety, as I regularly needed to report on the research being conducted by Penn State's food safety experts, and also edited course materials and bulletins for professionals and consumers about ensuring they had safe... Continue Reading
When someone gives you data to analyze, you can gauge how your life is going by what you've received. Get a Minitab file, or even comma-separated values, and everything feels fine. Get a PDF file, and you start to think maybe you’re cursed because of your no-good-dirty-rotten-pig-stealing-great-great-grandfather and wish that you were someone else. For those of you who might be in such dire... Continue Reading
I recently fielded an interesting question about the probability and survival plots in Minitab Statistical Software's Reliability/Survival menus: Is there a one-to-one match between the confidence interval points on a probability plot and the confidence interval points on survival plot at a specific percentile? Now, this may seem like an easy question, given that the probabilities on a survival plot... Continue Reading
By Matthew Barsalou, guest blogger.   Minitab Statistical Software can assist us in our analysis of data, but we must make judgments when selecting the data for an analysis. A good operational definition can be invaluable for ensuring the data we collect can be effectively analyzed using software. Dr. W. Edwards Deming explains in Out of the Crisis (1989), “An operational definition of safe, round,... Continue Reading
In my previous post, I showed you that the coefficients are different when choosing (-1,0,1) vs (1,0) coding schemes for General Linear Model (or Regression).  We used the two different equations to calculate the same fitted values. Here I will focus on showing what the different coefficients represent.  Let's use the data and models from the last blog post: We can display the means for each level... Continue Reading
Since Minitab 17 Statistical Software launched in February 2014, we've gotten great feedback from many people have been using the General Linear Model and Regression tools. But in speaking with people as part of Minitab's Technical Support team, I've found many are noticing that there are two coding schemes available with each. We frequently get calls from people asking how the coding scheme you... Continue Reading
Earlier, I wrote about the different types of data statisticians typically encounter. In this post, we're going to look at why, when given a choice in the matter, we prefer to analyze continuous data rather than categorical/attribute or discrete data.  As a reminder, when we assign something to a group or give it a name, we have created attribute or categorical data.  If we count something, like... Continue Reading
The first summer blockbuster of 2015 was released two weeks ago—The Avengers: Age of Ultron. The first Avengers film featured a pretty well known cast of superheroes (if, of course, you’re a superhero fan). However, in the 40-year run of the Avengers comic book, that team has evolved to keep the material fresh and to allow some characters to go their solo ways. I want to use Minitab's statistical... Continue Reading