Ghostbusting Spaces in Minitab
It’s been almost 5 years since I used a quotation from Ghostbusters to introduce one of my early blog posts. But as we’re getting a few bits of entertainment news about the next installment in the Ghostbusters franchise, I thought it might be a good time to talk about busting ghosts in Minitab.
In the Minitab sense, ghosts are spaces that are in your data that you can’t see. The busting action is slightly different for text data and numeric data.
Remove Spaces from Text Data
Let’s look at text data first. Let’s say that you have data that look like this:
It looks like the same value repeated over and over again. It looks like there’s nothing there but the word “Ghostbusters.”
This is the moment where you get your PKE meter out. Or rather, this is where you tally your data to see what's going on. That moment’s going to be like this:
- Choose Stat > Tables > Tally Individual Variables.
- In Variables, enter Title. Click OK.
The count of the single word should be the same as the number of rows in the data. It’s not. There are ghosts there that you can’t see.
Tally for Discrete Variables: Title
Tally, standing in for your PKE meter, shows a stairstep pattern because of the extra spaces at the end of the word. Extra spaces that you can't see in the data window unless you highlight a cell.
One way to fix the problem would be to go through the data window clicking on the individual cells to find the ones with extra spaces to delete. Fortunately, Minitab gives you a better way to make all of these values the same. You can use the calculator function TRIM:
- Choose Calc > Calculator.
- In Store Result in Variable, enter Title2.
- In Expression, enter Trim(Title). Click OK.
If you tally the new values, you get the result you expect when all of the values are the same:
Tally for Discrete Variables: Title2
All those ghosts are in the trap, not in your data.
Remove Spaces from Numeric Data
If you wanted, you could trim a series of numbers to eliminate spaces, but a bigger problem remains unsolved. For example, if you paste some numbers formatted as text from Excel, you might get a column like this:
These numbers look like numbers. It doesn’t look like there are extra spaces that you can’t see, but the spaces are there. We can see them if we tally them:
Tally for Discrete Variables: Numbers
But the column is also formatted as text. That’s what the “–T” next to the C9 means in the worksheet. If we want to analyze the numbers, find the mean or create a scatterplot, then we need the numbers to be formatted as numbers. If we use the TRIM function, the numbers will still be text. For this, we’re going to have to use something a little bit stronger:
No, not crossing the streams. (Egon said that crossing the streams was bad.) Instead, we change the column format.
- Put the cursor in the column of numbers.
- Right-click and select Format Column.
- In Choose type, select Automatic numeric.
- Click OK.
Tally for Discrete Variables: Numbers
In one step, you’ve converted the format of the column and eliminated the extra spaces. The ghosts are all gone.
If you have to work with data, especially data that someone types, then you’re going to have to deal with messy data. The TRIM function and the ability to change column formats are two tools that Minitab gives you to get the ghosts out of your data.
Have other problems? Minitab has lots of other tools to simplify the cleaning that you’ll need to do for your data. If you’re ready for a few more tricks, check out 3 Ways to Clean Up Data So You Can Promote Public Dialog and 3 Features to Make You Glad You're You When You Have to Clean Data in Minitab.
Ghostbusters images: TM & © 2014 Columbia Pictures Industries Inc.