Are You Putting the Data Cart Before the Horse? Best Practices for Prepping Data for Analysis, Part 1
Most of us have heard a backwards way of completing a task, or doing something in the conventionally wrong order, described as “putting the cart before the horse.” That’s because a horse pulling a cart is much more efficient than a horse pushing a cart.
This saying may be especially true in the world of statistics. Focusing on a statistical tool or analysis before checking out the condition of your data is one way you may be putting the cart before the horse. You may then find yourself trying to force your data to fit an analysis, particularly when the data has not been set up properly. It’s far more efficient to first make sure your data are reliable and then allow your questions of interest to guide you to the right analysis.
Spending a little quality time with your data up front can save you from wasting a lot of time on an analysis that either can’t work—or can’t be trusted.
As a quality practitioner, you’re likely to be involved in many activities—establishing quality requirements for external suppliers, monitoring product quality, reviewing product specifications and ensuring they are met, improving process efficiency, and much more.
All of these tasks will involve data collection and statistical analysis with software such as Minitab. For example, suppose you need to perform a Gage R&R study to verify your measurement systems are valid, or you need to understand how machine failures impact downtime.
Rather than jumping right into the analysis, you will be at an advantage if you take time to look at your data. Ask yourself questions such as:
- What problem am I trying to solve?
- Is my data set up in a way that will be useful to answering my question?
- Did I make any mistakes while recording my data?
Utilizing process knowledge can also help you answer questions about your data and identify data entry errors. A focus on preparing and exploring your data prior to an analysis will not only save you time in the long run, but will help you obtain reliable results.
So then, where to begin with best practices for prepping data for an analysis? Let’s look no further than your data.
Clean your data before you analyze it
Let’s assume you already know what problem you’re trying to solve with your data. For instance, you are the area supervisor of a manufacturing facility, and you’ve been experiencing lower productivity than usual on the machines in your area and want to understand why. You have collected data on these machines, recording the amount of time a machine was out of operation, the reason for the machine being down, the shift number when the machine went down, and the speed of the machine when it went down.
The first step toward answering your question is to ensure your data are clean. Cleaning your data before you begin an analysis can save time by preventing rework, such as reformatting data or correcting data entry errors, after you’ve already begun the analysis. Data cleaning is also essential to ensure your analyses and results—and the decisions you make—are reliable.
With the latest update to Minitab, an improved data import helps you identify and correct case mismatches, fix improperly formatted columns, represent missing data accurately and in a manner that is recognized by the software, remove blank rows and extra spaces, and more. When importing your data, you see a preview of your data as a reminder to ensure it’s in the best possible state before it finds its way into Minitab. This preview helps you spot mistakes you have made in your data collection, and automatically corrects mistakes you don’t notice or that are difficult to find in large data sets.
Minitab offers a data import dialog that helps you quickly clean and format your data before importing into the software, ensuring your data are trustworthy and allowing you to get to your analysis sooner.
If you’d rather copy and paste your data from Excel, Minitab will ensure you paste your data in the right place. For instance, if your data have column names and you accidentally paste your data into the first row of the worksheet, your data will all be formatted as text—even when the data following your column names are numeric! With Minitab, you will receive an alert that your data is in the wrong place, and Minitab will automatically move your data where it belongs. This alert ensures your data are formatted properly, preventing you from running into the problem during an analysis and saving you time manually correcting every improperly formatted column.
Pasting your Excel data in the first row of a Minitab worksheet will trigger this warning, which safeguards against improperly formatted columns.
This is only the beginning! Minitab makes it quick and painless to begin exploring and visualizing your data, offering more insights and ease once you get to the analysis. If you’d like to learn additional best practices for prepping your data for any analysis, stay tuned for my next post where I’ll offer tips for exploring and drawing insights from your data!