Assisting Professionals with Data Analysis


My name is Andy Cheshire, and I currently work in Minitab's Technical Support department. Every day, I assist professionals who have questions about how to use Minitab to accomplish their analyses. Sometimes the call can be as simple as how to find a specific menu in Minitab. Other times, it can be more statistical in nature, as I try to explain how one Gage tool differs from another. The range of topics discussed vary greatly--one call could involve setting up a Capability Analysis, then next could be tweaking a Design of Experiments.

We try to make Minitab as user-friendly as possible for...

Transformers! Normal Data in Disguise?

Many statistical analyses require an assumption of normality. In cases when your data are not normal, sometimes you can apply a function to make your data approximately normal so that you can complete your analysis.

If you've seen any of the Transformers movies, you know that these extraordinary robots can, with some Hollywood magic, turn themselves into apparently normal items like cars and appliances.

You may not get quite the same special-effects thrill, but when you have an extraordinary (i.e., non-normal) data set, Minitab Statistical Software can pull a Transformers-like metamorphosis on...

More Tips and Tricks for Date/Time Data

Last time, I shared some useful tools for handling date and time data. But Minitab has many other useful tools for manipulating date/time data that you might not be aware of. Let’s take a look at a few more helpful tips and tricks.

Extracting Information from a Date/Time Column

If you look under the Data menu, you’ll notice Extract from Date/Time > To Numeric or To Text. This function allows you to take one or multiple components that make up your date/time values and transform it into a new data format.

The table below illustrates the conversion that takes place when you select Quarter and Year...

Tips and Tricks for Date/Time Data

Whether you're a Lean Six Sigma black belt, a researcher, or a statistics student, at some point you will need to work with data that involve either dates, times, or both. Do you know where all of your date/time tools are? Let’s take a quick trip through Minitab Statistical Software and see what it has to offer for date/time data:

Column Indicators of Date and Time Data

You will know when Minitab recognizes a column as a date/time format if it has a “-D” to the right of the number, as in “C1-D”.  If Minitab recognizes the column as text data, you’ll see a “C1-T.” If it’s numeric data, nothing...

Poisson Rates and the Undead!

Coming off an epic season finale of AMC’s The Walking Dead, I thought that zombies might be able to lend a hand in explaining the 2-Sample Poisson Rate test.

What Do Zombies Have to Do with the 2-Sample Poisson Rate Test?

Let’s say that you and a group of refugees are facing the zombie menace, and an area in which you are camping is soon to be overrun by the undead. In order to know which county to traverse next, you call in helicopter scout teams to observe land in two nearby counties.

  • One helicopter observes County A and looks at the first 40 acres of land, recording the number of zombies in...

Just Another Statistic at San Diego Comic Con

I was pretty much all set for my trip to San Diego Comic Con this July. For those unaware, “SDCC” is traditionally a four-day event held in San Diego every year that showcases professionals, exhibitors and special guests who have ties with comics, science fiction, film, and other popular arts.

I had the flight, rental car, and hotel all taken care of. Everything was going according to plan….all I needed were my Comic Con tickets which went on sale this past weekend. Unfortunately, I was one of the unlucky many who were unable to get a ticket, as all of the tickets sold out within 90 minutes of...

Unbalanced Designs and Gage R&R (Expanded)

Last week, a customer called with an issue related to running a Gage R&R nested design in Minitab Statistical Software.  Everything initially looked okay, as he had the three columns necessary to perform a successful study: one for parts, one for operators, and another for the measurements.  However, when he tried to analyze his data using Stat > Quality Tools > Gage Study > Gage R&R (Nested), he would get this error message:

Gage R&R Study - Nested ANOVA

* ERROR * Design is not balanced; execution aborted.

One of the assumptions in performing a Nested Gage analysis is that the data is balanced,...

Punxsutawney Phil and His 2-Sample T-test

Groundhog Day is apparently a pretty big deal in Punxsutawney, Pennsylvania. According to one CBS article, organizers expected over 15,000 people to see the United States’ most popular groundhog last week.

For those unfamiliar with the legend, here's the idea: If Punxsutawney Phil sees his shadow after coming out of his hole, he will retreat and we will be forced to endure six more weeks of winter. If he doesn’t see his shadow, then we will be graced with an early spring. Sounds far-fetched, right? Let’s look at a 2-sample t-test to see if Phil has what it takes to be Punxsutawney’s chief...

Choosing What the Bars in Your Chart Represent

I recently received questions about creating bar charts within Minitab and thought I'd share some information about this common task on the blog. When creating a bar chart in Minitab, you begin by going to Graph > Bar Chart. This brings up a dialog box: 
  



The next step can prove quite vexing. In the dialog box, it can be tempting to quickly select the picture that best represents what you want your graph to look like. However, Minitab needs to know what type of data you are trying to graph. Above the pictures is a dropdown that will tell Minitab what exactly the data represent. It’s important...

More Cluster Analysis Tips

I want to offer some clarification on my earlier blog about Cluster Analysis.

There may have been some confusion as to what the four dendrograms were trying to display in my prior post. The first dendrogram in the four-graph layout represented the final partition if the user chose only one cluster. If the user chose the final partition to be four clusters, the end result would be the last graph in the layout. The four graphs were trying to illustrate which clusters (distinguished by color) Minitab sequentially chooses as you increase the value of the final partition from 1 to 4.

The alternative...

Cluster Analysis Tips

Cluster Analysis aims to establish a set of clusters such that cases within a cluster are more similar to each other than are cases in other clusters.

In other words, we're using data to arrange objects into groups. Arranging objects into groups is a natural skill we all use and share.

In life, we use an item’s characteristics or traits to determine what group to throw them in.  In Cluster Analysis, the metrics “similarity” and “distance” are used to perform the very same action when arranging items into groups.

Minitab uses a hierarchical clustering method. It starts with single member...

A Rational Look at Subgrouping in Control Charts

It looks like I may have gotten my wish: NBA games should start Christmas day, which will be a huge present to a lot of fans. So while I ponder whether to proceed with a second post about statistics related to the lockout, let's shed some light on a different subject. 

From time to time, Minitab users will call in needing assistance with control charts.  One common problem people encounter when filling out the control chart dialogs is subgrouping. 

For example, here is a dialog box for an Xbar-R Chart (Stat->Control Charts->Variables charts for Subgroups):
  

 
The Subgroup sizes box for the...

NBA Lockout: A Look at Some of the Statistics

BAH! The NBA makes me mad.  Seriously, why can’t they resolve this current lockout? I want to have something to watch after the NFL season is over. 

Let’s talk about what is holding up the players and the organizations from agreeing to a decision on how money is distributed.
 
The NBA Salary Cap is the limit as to how much NBA teams are allowed to pay their players.  It was actually first initiated in 1946, but it only lasted that one season. The first modern cap started in the '84-85 season, and has been in place ever since. Here is a Time Series Plot (Stat > Time Series > Time Series Plot in...

Customizing the Interface: What Halloween Costume Does Minitab Want to Wear This Year?

As Halloween festivities near, you can customize your Minitab interface to match the season. I know, it’s corny -- but here's how:

1.    Edit your Session Window Fonts!

        a)    Go to Tools->Options. Then go to Session Window, and expand the + sign to see more options.
        b)    Change your Title Font and Comment Font to ‘Chiller’. You may increase the size of the font if you wish.

Example of Session results from an I-MR Chart and the beginning title for an Individual Value Plot:



Oooooooooo…Will these data points SURVIVE after you've analyzed your data for special causes????...

Video Games and Children: A Statistically Significant Change from 2009 to 2011?

An article by the NPD Group, a leading market research company, states that today, 91 percent of kids in the United States aged 2-17 play video games. This is an increase of 9 percentage points when compared to their 2009 survey results.  Is this a statistically significant increase?

I did some research and was able to find the methodology used for NPD’s results.  2011’s survey utilized 4,136 individuals ages 2-17, whereas 2009’s results cited "over 5,000 members."  Hmm. Well, I guess I have to assume that it’s a number close to 5,000, so we’ll round down to 5,000.  Because the articles...

Questions about Capability Statistics - Part 2

Another thing you may notice when performing a Capability Analysis in Minitab is the option to change whether you want to see capability statistics (Cp, Pp…) or Z.Bench.  From Minitab Help:
  
“A disadvantage of the Cpk index is that it only represents one side of the process curve, and tells you nothing about the other extreme. For example, the two graphs below display processes with identical Cpk values. However, one violates both specification limits, and the other only violates the lower specification limit...”
  

  
The Z in Z.Bench refers to the standard normal distribution with mean 0...

Questions about Capability Statistics - Part 1

When you're conducting a Capability Analysis, do you know which statistic to look at?  We get a fair number of questions about this, so let's explore the question. 

In the graph below, you’ll see that there are two categories for Capability Statistics, Potential and Overall:

 

The Potential (Within) Capability statistics(Cp, CPL…) are based off of the estimate for within standard deviation. Well, what is within standard deviation? This simply represents the variation within your subgroups.  On the flip side, overall standard deviation takes into account variation from the entire data set...

How to Interpret Gage R&R Output - Part 2

Another common question with Gage Crossed is what table to look at when assessing your measurement system.  By default, Minitab gives a %Contribution table and %Study Variation table. Which one should you use when assessing where the variation is mostly coming from? Well, you could use either of them.

The %Contribution table can be convenient because all sources of variability add up nicely to 100%. Example:

The %Study Variation table doesn’t have the advantage of having all sources add up nicely to 100%, but it has other positive attributes. Because standard deviation is expressed in the...

Understanding "Number of Distinct Categories" in Your Gage R&R Output

Recently I've been thinking about common questions that customers ask when running a Gage R&R analysis in Minitab.

For example, when you run a Gage R&R, the last result that shows up in the session window is a value for the ‘Number of Distinct Categories’.  This one metric is something that customers seem to overlook when they call to discuss their Gage studies.
    
 
  
This value represents the number of groups your measurement tool can distinguish from the data itself. The higher this number, the better chance the tool has in discerning one part from another.

So how do you know if your...

Statistical Tools for Predicting Group Membership

Riddle: What two tools in Minitab can be used to perform the same analysis on your data? Well, there are probably a few pairs that can be mentioned, but I am going to focus on Discriminant Analysis and Binary Logistic Regression.

These tools can be used to predict group membership.  If we look at exh_mvar.mtw, located in Minitab’s sample data folder, we have the perfect data set to use. Here is a snapshot of the first 30 or so observations:

 



Fifty fish from each place of origin (Alaska, Canada) were caught and growth ring diameters of scales were measured for the time when they lived in...

A brief analysis of superhero movies made in the past few decades.

Well, it is quite another superhero movie filled summer for Hollywood. Marvel Comics, owner of popular superheroes such as Spider-Man and Iron Man, is producing movies for each character that will appear in the upcoming 2013 vehicle entitled ‘The Avengers.’ DC Comics, most famous for their characters Batman and Superman, is trying to get more of their relatively unknown characters onto the big screen to match the filmography of its aforementioned competitor.  With the recent flop of DC’s Green Lantern and modest box office earnings of Marvel’s Thor, a few critics have mentioned the possibility...