An Unauthorized Biography of the Stem-And-Leaf Plot, Part I: A Stem by Any Other Name
Greetings fair reader. In the past, I've written several posts with practical tips related to Minitab graphs, such as:
- How to discuss the sensitive issue of P charts and Laney P' charts with your doctor
- How to use a G chart to monitor parenting success
- How to use a scatterplot to start your own doomsday cult
In this post, I thought I'd take a step back and explore the historical side of Minitab's graphs. Specifically, we'll explore the history behind the dodo bird of the graph kingdom: the Stem-and-Leaf plot.
Recently, I was chatting with Bob (not her real name) in Minitab's excellent Technical Support department. Bob recently got a call from a customer who wanted to know how the Stem-and-Leaf plot got its name. Imagine my shock and horror as Bob explained that many budding statisticians don't know this story! (For the record, I personally do not subscribe to the notion that budding is the only successful reproductive strategy for statisticians like myself. There's also cloning.)
For the curious, but uninitiated, here's an example of a Stem-and-Leaf plot:
Each digit in the right column represents a single value from the sample. The left column serves as a base of equally spaced intervals (or "bins," as I will later call them). Together, the value in the left column and the digits in the right column give you the data values in each bin. For example, the largest value in the sample of budding rates is 15.3 (bottom row).
At this juncture, experienced readers might ask, "Wait a minute, what about the counts?" I, for one, take no stock in antiquated monarchistic hierarchies, so I tend to leave out the counts. However, you are correct: Stem-and-Leaf plots often include a column of counts on the far left. For example, here is a Stem-and-Leaf plot as it appears in Minitab Statistical Software:
The left-most column shows the cumulative count of observations from the top to the middle and from the bottom to the middle. In the example, the counts indicated that the top 5 rows include a total of 14 observations. And the bottom 5 rows also contain 14 observations.
The Seeds of a Plot
The Stem-and-Leaf plot is sometimes called a "character graph", owing (no doubt) to the colorful and interesting characters who invented it: Dr. Woodrow "Woody" Stem, and Dr. August "Russell" Leaf. (I think they won the Nobel prize for their work. I don't know for sure, but I'd have to look it up, so let's go with that.)
Woody and Russell were aspiring students under the careful tutelage of renowned statistician, Dr. Histeaux Graham. The good professor challenged our heroes, as he did all of his students, to come up with a new and better way to examine the distribution of values in a sample. (Mind you, this was before computers, calculators, or even highly-leveraged derivatives trading.)
Legend has it that our heroes Woody and Russell "got into it" one night at a pub, after a particularly intense lecture by Dr. Graham on the twin scourges of platykurtosis and leptokurtosis. (Mind you, this was before penicillin.)
Woody was adamant that in order to meet Dr. Graham's, challenge they must divide the sample into equal intervals (or "bins," as he would later call them).
Russell insisted that Woody's idea was a load of BS (Basic Statistics). "How can you understand a sample," Russell demanded, "by looking at a bunch of evenly spaced intervals, or 'bins' as you will no doubt resort to calling them?!?"
Sadly, it was at this juncture that our learned gentlemen succumbed to more pugilistic impulses: fisticuffs broke out. (Mind you, this was before boxing gloves, moisturizing cream, or even those convenient Isotoner one-size-fits-most gloves.)
I will spare you the details, but suffice it to say that the damage was significant. Woody looked like he had been worked over with the business end of a boxplot, and Russell's wallis was completely kruskaled. It looked like Dr. Graham's challenge might go answered. And it may have, but for an unlikely twist of fate.