What to Do When Your Data's a Mess, part 3

Everyone who analyzes data regularly has the experience of getting a worksheet that just isn't ready to use. Previously I wrote about tools you can use to clean up and eliminate clutter in your data and reorganize your data

In this post, I'm going to highlight tools that help you get the most out of messy data by altering its characteristics.

Know Your Options

Many problems with data don't become obvious until you begin to analyze it. A shortcut or abbreviation that seemed to make sense while the data was being collected, for instance, might turn out to be a time-waster in the end. What if abbreviated values in the data set only make sense to the person who collected it? Or a column of numeric data accidentally gets coded as text?  You can solve those problems quickly with statistical software packages.

Change the Type of Data You Have

Here's an instance where a data entry error resulted in a column of numbers being incorrectly classified as text data. This will severely limit the types of analysis that can be performed using the data.

misclassified data

To fix this, select Data > Change Data Type and use the dialog box to choose the column you want to change.

change data type menu

One click later, and the errant text data has been converted to the desired numeric format:

numeric data

Make Data More Meaningful by Coding It

When this company collected data on the performance of its different functions across all its locations, it used numbers to represent both locations and units. 

uncoded data

That may have been a convenient way to record the data, but unless you've memorized what each set of numbers stands for, interpreting the results of your analysis will be a confusing chore. You can make the results easy to understand and communicating by coding the data. 

In this case, we select Data > Code > Numeric to Text...

code data menu

And we complete the dialog box as follows, telling the software to replace the numbers with more meaningful information, like the town each facility is located in.  

Code data dialog box

Now you have data columns that can be understood by anyone. When you create graphs and figures, they will be clearly labeled.  

Coded data

Got the Time? 

Dates and times can be very important in looking at performance data and other indicators that might have a cyclical or time-sensitive effect.  But the way the date is recorded in your data sheet might not be exactly what you need. 

For example, if you wanted to see if the day of the week had an influence on the activities in certain divisions of your company, a list of dates in the MM/DD/YYYY format won't be very helpful.   

date column

You can use Data > Date/Time > Extract to Text... to identify the day of the week for each date.


Now you have a column that lists the day of the week, and you can easily use it in your analysis. 

day column

Manipulating for Meaning

These tools are commonly seen as a way to correct data-entry errors, but as we've seen, you can use them to make your data sets more meaningful and easier to work with.

There are many other tools available in Minitab's Data menu, including an array of options for arranging, combining, dividing, fine-tuning, rounding, and otherwise massaging your data to make it easier to use. Next time you've got a column of data that isn't quite what you need, try using the Data menu to get it into shape.




blog comments powered by Disqus