dcsimg
 

Data Analysis

Blog posts and articles with tips for analyzing data for quality improvement methodologies, including Six Sigma and Lean.

You often hear the data being blamed when an analysis is not delivering the answers you wanted or expected. I was recently reminded that the data chosen or collected for a specific analysis is determined by the analyst, so there is no such thing as bad data—only bad analysis.  This made me think about the steps an analyst can take to minimise the risk of producing analysis that fails to answer... Continue Reading
An outlier is an observation in a data set that lies a substantial distance from other observations. These unusual observations can have a disproportionate effect on statistical analysis, such as the mean, which can lead to misleading results. Outliers can provide useful information about your data or process, so it's important to investigate them. Of course, you have to find them first.  Finding... Continue Reading

7 Deadly Statistical Sins Even the Experts Make

Do you know how to avoid them?

Get the facts >
It’s not easy to get data ready for analysis. Sometimes, data that include all the details we want aren’t clean enough for analysis. Even stranger, sometimes the exact opposite can be true: Data that are convenient to collect often don’t include the details that we want when we analyze them. Let’s say that you’re looking at the documentation for the National Health and Nutrition Examination Survey... Continue Reading
Businesses are getting more and more data from existing and potential customers: whenever we click on a web site, for example, it can be recorded in the vendor's database. And whenever we use electronic ID cards to access public transportation or other services, our movements across the city may be analyzed. In the very near future, connected objects such as cars and electrical appliances will... Continue Reading
Remember the classic science fiction film The Matrix? The dark sunglasses, the leather, computer monitors constantly raining streams of integers (inexplicably in base 10 rather than binary or hexadecimal)? And that mind-blowing plot twist when Neo takes the red pill from Morpheus' outstretched hand? Well to me, there's one thing even more mind-blowing than the plot of the Matrix: the Matrix Plot.... Continue Reading
Time series data is proving to be very useful these days in a number of different industries. However, fitting a specific model is not always a straightforward process. It requires a good look at the series in question, and possibly trying several different models before identifying the best one. So how do we get there? In this post, I'll take a look at how we can examine our data and get a feel... Continue Reading
There may not be a situation more perilous than being a character on Game of Thrones. Warden of the North, Hand of the King, and apparent protagonist of the entire series? Off with your head before the end of the first season! Last male heir of a royal bloodline? Here, have a pot of molten gold poured on your head! Invited to a wedding? Well, you probably know what happens at weddings in the show. ... Continue Reading
Suppose you’ve collected data on cycle time, revenue, the dimension of a manufactured part, or some other metric that’s important to you, and you want to see what other variables may be related to it. Now what? When I graduated from college with my first statistics degree, my diploma was bona fide proof that I'd endured hours and hours of classroom lectures on various statistical topics, including l... Continue Reading
This is an era of massive data. A huge amount of data is being generated from the web and from customer relations records, not to mention also from sensors used in the manufacturing industry (semiconductor, pharmaceutical, petrochemical companies and many other industries). Univariate Control Charts In the manufacturing industry, critical product characteristics get routinely collected to ensure... Continue Reading
Do you recall my “putting the cart before the horse” analogy in part 1 of this blog series? The comparison is simple. We all, at times, put the cart before the horse in relatively innocuous ways, such as eating your dessert before you’ve eaten your dinner, or deciding what to wear before you’ve been invited to the party. But performing some tasks in the wrong order, such as running a statistical... Continue Reading
While many Six Sigma practitioners and other quality improvement professionals like to use the Fishbone diagram in Quality Companion for brainstorming because of its ease of use and integration with other Quality Companion tools, some Minitab users find an infrequent need for a Fishbone diagram. For the more casual user of the Fishbone diagram, Minitab has the right tool to get the job done. Minitab... Continue Reading
Analysis of variance (ANOVA) can determine whether the means of three or more groups are different. ANOVA uses F-tests to statistically test the equality of means. In this post, I’ll show you how ANOVA and F-tests work using a one-way ANOVA example. But wait a minute...have you ever stopped to wonder why you’d use an analysis of variance to determine whether means are different? I'll also show how... Continue Reading
Among the most underutilized statistical tools in Minitab, and I think in general, are multivariate tools. Minitab offers a number of different multivariate tools, including principal component analysis, factor analysis, clustering, and more. In this post, my goal is to give you a better understanding of the multivariate tool called discriminant analysis, and how it can be used. Discriminant... Continue Reading
You can use contour plots, 3D scatterplots, and 3D surface plots in Minitab to view three variables in a single plot. These graphs are ideal if you want to see how temperature and humidity affect the drying time of paint, or how horsepower and tire pressure affect a vehicle's fuel efficiency, for example. Ultimately, these three graphs are good choices for helping you to visualize your data and exa... Continue Reading
In statistics, t-tests are a type of hypothesis test that allows you to compare means. They are called t-tests because each t-test boils your sample data down to one number, the t-value. If you understand how t-tests calculate t-values, you’re well on your way to understanding how these tests work. In this series of posts, I'm focusing on concepts rather than equations to show how t-tests work.... Continue Reading
Along with the explosion of interest in visualizing data over the past few years has been an excessive focus on how attractive the graph is at the expense of how useful it is. Don't get me wrong...I believe that a colorful, modern graph comes across better than a black-and-white, pixelated one. Unfortunately, however, all the talk seems to be about the attractiveness and not the value of the... Continue Reading
As a recent graduate from Arizona State University with a degree in Business Statistics, I had the opportunity to work with students from different areas of study and help analyze data from various projects for them. One particular group asked for help analyzing online survey data they had gathered from other students, and they wanted to see if their new student program was beneficial. I would... Continue Reading
Did you know about the Minitab Network group on LinkedIn? It’s the one managed by Eston Martz, who also edits the Minitab blog. I like to see what the members are talking about, which recently got me into some discussions about Raman spectroscopy data. Not having much experience with Raman spectroscopy data, I thought I’d learn more about it and found the RRUFFTM Project. The idea is that if you... Continue Reading
Getting your data from Excel into Minitab Statistical Software for analysis is easy, especially if you keep the following tips in mind. Copy and Paste To paste into Minitab, you can either right-click in the worksheet and choose Paste Cells or you can use Control-V. Minitab allows for 1 row of column headers, so if you have a single row of column info (or no column header info), then you can quickly... Continue Reading
T-tests are handy hypothesis tests in statistics when you want to compare means. You can compare a sample mean to a hypothesized or target value using a one-sample t-test. You can compare the means of two groups with a two-sample t-test. If you have two groups with paired observations (e.g., before and after measurements), use the paired t-test. How do t-tests work? How do t-values fit in? In this... Continue Reading