dcsimg
 

Correct Case Mismatches in Minitab, Fast

In this day and age, it’s not uncommon that data entry errors occur in data sets that are so large that looking for and correcting the errors by hand is impractical. Fortunately, Minitab includes tools that make it easy to get your data into shape, so that you can proceed to getting the answers you need.

tropical forest

Let’s say, for example, that you were going to look at the Global Wood Density Database. It’s an exciting piece of work if you’re into wood density. Chave et al. called it “the largest compilation of wood density data to date, encompassing 8412 taxa, 1683 genera, 191 families” (2009). Kindly, however, it’s provided at datadryad.org as an Excel file.

As it turns out, there’s a minor error in the Region column (at least as of this writing). You’d probably hardly notice it, but there’s a case mismatch. A total of 4,182 rows are given the region South America (tropical) while 9 rows of the dataset are given the region South America (Tropical). This is the kind of thing that can cause problems in your analysis. If you suspect such an error exists, or just want to verify that it doesn't, it would be a real chore to pore through 4,191 rows of data in search of mismatches.

Fortunately, you could find them by doing a quick tally in Minitab.

Find It

  1. Choose Stat > Tables > Tally Individual Variables.
  2. In Variables, enter Region. Click OK.

In the output table, you can spot the case mismatch at the bottom.

9 instances of South America (Tropical) use an upper-case t, while the other 4182 use a lower-case t.

Fix it

Fixing case mismatches is extremely easy in Minitab. Try this:

  1. Choose Data > Recode > To Text.
  2. In Recode values in the following columns, enter Region.
  3. In Method, select Recode individual values.
  4. In the table that appears, scroll down to find the case mismatch. Then, in the Recoded value column, change South America (Tropical) so that it uses a lower-case t.
  5. In Storage location for the recoded columns, select In the original columns. Click OK.

The summary shows you the 9 instances that were changed.

The summary shows the 9 rows with the upper-case t were corrected.

Fix It Before It’s Even a Problem

If you’re opening an Excel file, Minitab can fix case mismatches before you even know that they’re a problem. If you have the Global Wood Density Database saved and you open it, in Minitab, you’re presented with options for opening an Excel file. Try this:

  1. Choose File > Open and select the Excel file from your file system.
  2. Click the tab titled Data, the name of the sheet with the data in the original Excel file.
  3. Select Data has column names.
  4. Click Options.
  5. In Text columns, select Correct case mismatches. Click OK twice.

If you tally the Regions column now, the correction to the column is already done.

All 4191 instances of South America (tropical) use a lower-case t.

Wrap-up

To get the answers you need from your data, the data themselves have to be clean enough to analyze. Minitab provides a number of tools you can use to get your data ready faster, so that you can get on to the insights. Ready for more? Check out 3 Tips for Importing Excel Data into Minitab.

References

Chave J, Coomes DA, Jansen S, Lewis SL, Swenson NG, Zanne AE (2009). Towards a worldwide wood economics spectrum. Ecology Letters 12(4): 351-366. http://dx.doi.org/10.1111/j.1461-0248.2009.01285.x

Zanne AE, Lopez-Gonzalez G, Coomes DA, Ilic J, Jansen S, Lewis SL, Miller RB, Swenson NG, Wiemann MC, Chave J (2009). Data from: Towards a worldwide wood economics spectrum. Dryad Digital Repository. http://dx.doi.org/10.5061/dryad.234

7 Deadly Statistical Sins Even the Experts Make

Do you know how to avoid them?

Get the facts >

Comments

blog comments powered by Disqus