You take the good, you take the bad,
you take them both and there you have
The Facts of Life, the Facts of Life.
"The Facts of Life," by Alan Thicke, Gloria Loring and Al Burton, sung by Gloria Loring.
When you’re doing process analysis, it’s important to verify the quality of the data. The facts of life are that data is rarely ready for analysis just because it’s in Minitab. To have confidence in your statistical analyses, you’ll have to be confident in the quality of the data that you have.
In particular, a graph is a good way to check your data. If you find anything unusual, investigate those data points before you move on. In some cases, the analysis of the unusual data can be the most informative step in process analysis.
Let’s graph some data:
- If you haven’t already, use my previous entry to get the number of characters per line in I Wandered Lonely as a Cloud by Wordsworth.
- Choose Graph > Histogram
- In the gallery of possible histograms, choose Simple.
- In Graph Variables, enter the column that contains the number of characters for each line of the poem. Click OK.
- If you’ve left the data the way I did, you’ll see a histogram like this:
Two features of this data are unusual:
3 lines with 0 characters
2 lines with over 55 characters
A glance at the worksheet shows that the zeroes are line breaks between stanzas, but what about the long lines? Did Wordsworth have more to say in those lines?
As it turns out, the longest lines aren’t where Wordsworth waxes extra-poetical. They’re lines where Bartleby has extra spaces so it can align the line numbers. Once you’ve identified what’s going on, you can make intelligent decisions about what to do so that the data reflect what you really care about.
In almost all process analysis, you have to verify the quality of your data. Graphs like the histogram are a great way to make sure that that the data represent what you really want to know. If you have some more time, check out some tips for editing your graphs in Minitab. What kinds of graphs are your favorites?
And, thanks to sites like Bartleby, literature is an easy way to collect some data that will help you build your confidence in your ability to verify data quality.
If we’re really interested in the number of characters per line Wordsworth used, we can get rid of the line breaks, extra spaces, and line numbers.