Collecting Random Data Isn't Monkey Business
“How do you write your blogs?" someone asked me the other day.
“It’s really simple," I replied. "I just apply the infinite monkey theorem.”
According to the infinite monkey theorem, if enough monkeys type randomly on a keyboard for a long enough time (infinity), they will be almost certain to produce any given text: a play by Shakespeare, the U.S. Constitution, or Minitab Help.
A key premise is the concept of randomness. The monkeys must be equally likely to strike any key of the keyboard. That makes the eventual typing of Hamlet, or any other outcome, theoretically possible with enough trials.
In fact, a software developer put this theorem to the test last summer by simulating the random typing of “infinite monkeys." His virtual troops of cybersimians generated over 7 trillion random character groups, and in so doing reportedly reproduced the complete works of Shakespeare.
Randomization in Practice: Man vs Monkey
Of course, unlike theoretical monkeys, real monkeys don’t type randomly. To prove it, researchers at Plymouth University put a desktop computer into an enclosure with six crested macaques. The monkeys bashed the computer with a rock, pooped and peed on it, and typed the letter "S" over and over again.
Not exactly the finest example of statistical randomness.
Left solely to our natural instincts, we humans fare only slightly better at randomizing. A few years ago, I was advising a graduate researcher who was analyzing egg quality at an agricultural center. “Were the eggs for the study randomly selected?" I asked.
“Yes,” the researcher replied. He had picked “just any egg” from a cart containing stacks of trays filled with eggs, without really thinking about it.
Uh-oh. Here’s where our intuitive, everyday concept of the word “random” leads us astray. The everyday meaning of "random" is closer to "haphazardly," or “just any ol’ way.” It lacks the rigorous statistical definition of “equally likely.”
If you grab an egg from trays stacked on a cart, is every egg equally likely of being chosen? Probably not. The proximity of the eggs, the location of the tray on the cart, their color, shape, or condition might all (consciously or unconsciously) influence their selection.
So why are statisticians so nitpicky about randomness? If a sample is prone to selective bias, then your analysis results will apply only to your sample—not to the larger population that the sample comes from. You won’t be able to extrapolate much from your results.
Randomization in Practice: Minitab vs Man
Random data, then, in the strict statistical sense, is not gathered randomly, in the everyday sense. It requires a fair bit of forethought and planning.
As a randomizer, Minitab Statistical Software beats monkeys and humans hands down.
Suppose you want to take a simple random sample of 10 items from a group of 100,000 items. In a Minitab worksheet, enter the values 1 to 100,000 in column C1 to represent each item. (Tip: Unless you want to feel like a monkey typing for infinity, use Calc > Make Patterned Data > Simple Set of Numbers to do this).
Next, choose Calc > Random Data > Sample From Columns and indicate the number of rows (items) you want to sample—in this case, 10.
Here's the random sample I got—yours will differ, of course.
Now, maybe you think you could have picked 10 random values on your own. Trust me, you can't. If you don't believe me, take a minute to watch this video by some clever kids.
Author's Note: Some mathematicians dispute the infinite monkey theorem, arguing that, for all practical purposes, the probability of successfully recreating any given work by random typing is really nil, based on the law of large numbers.
But I disagree. Yesterday, I put a monkey and keyboard together. On its very first try, the monkey randomly created a clearer, more intelligible version of the U.S. federal tax code. On its second try, it got Finnegan's Wake. And on its third try, it got this blllooooooggg ssssssssssssssssssssss$&%&^%