The Ascent to Everest: Exploratory Statistics
Earlier this month, thousands of trekkers were stranded by bad weather near Mount Everest and had to be evacuated. The news made me wonder: Just how many people are chillin' on Chololungma ( “Holy Mother”—the Tibetan name for Everest) these days?
So I decided to do some exploring of the statistical variety, using Minitab as my trusty Sherpa.
Exploratory analyses are a great way to reveal unexpected characteristics of your process. They just require an open mind, an ability to ask questions, and easy-to-use statistical software.
For example, a time series plot displays observations sequentially so you can quickly see process dynamics, such as how a process changes over specific time periods or under different conditions.
Using data on the number of ascents to the summit from 1953 to 2010, I created a time series plot to visualize the trends (Graph > Time Series Plot > Simple in Minitab).
This plot looks almost like the slope of a mountain, doesn’t it? The increasing trend is pretty dramatic. In a few decades, you and I may be the only people in the world who haven’t climbed Everest. (Hey, there’s a way to get in the record books in the future—just stay in your chair.)
Exploratory analysis: I wonder as I wander...
Climbing Everest is neither for the faint of heart nor the faint of pocketbook. Prices for a guided ascent generally range from about $30,000 to $75,000.
So could this sharp rising trend in ascents be associated with a rise in disposable income for high-income earners? Or with the economic rise of China? What about the increase in popularity in the Dalai Lama and Tibetan culture? Or variables of self-fulfillment and happiness measured in sociological studies during this time?
In an exploratory phase, you should satisfy your mind’s wanderlust. Keep yourself open to any potential patterns or associations you might want to explore.
My eye is also drawn to that sharp dip in 2008, which seems to bely the general trend. Is it just random variation? Or the result of some special cause?
A little sleuthing online reveals a possible explanation: In 2008, the northeast route to the peak, which is cheaper, was closed by the Chinese government for the entire climbing season, except for athletes carrying the Olympic torch for the 2008 Summer Olympics. That route was closed to foreigners once again in 2009 near the 50th anniversary of the Dalai Lama's exile.
The Clustered Bar Chart: It's not about the destination, it's about scaring yourself silly.
If you’re an adrenaline junky looking for death thrills, is Everest really the Himalayan peak for you?
A clustered bar chart is a great exploratory analysis tool to compare data across groups. Using data on the number of climbers and the number of deaths for each Himalayan peak, I created a clustered bar chart in Minitab.
Tip: I flipped the horizontal and vertical scales to more easily compare the number of fatalities and the fatality rate for each Himalayan peak. That’s a handy option if you ever want to more easily compare across groups using responses with two different scales. (Choose Graph > Bar Chart – Cluster. Click Scale and check Transpose value and category scale.)
Based on this chart, which peak would you climb, if you had to choose one? (Let me know. I'll be at base camp Mount Nittany, carbo loading on gummi bears).
Notice how a count and a rate can provide contrasting results even when they’re based on the same data (see another example related to tracking process defects).
Using the bar chart of deaths on the left, you might conclude, “Everest is the most dangerous peak—it’s claimed over 3 times as many lives as any other Himalayan peak.” Using the bar chart of fatality rates on the right, you might say, “Everest is really one of the safest peaks—your chance of dying on it is lower than on the other frequently climbed Himalayan mountains.”
In this case, which do you think better represents the “true danger”?
Two Proportions Test: Putting differences to the test
The fatality rate on Everest is about 6%. The fatality rate on Annapurna is near 40%. Suppose we assume that the individuals who’ve attempted to scale each peak are representative of mountain climbers in general. Is this difference statistically significant?
You don't need crampons, ropes, or an ice axe to find out: open Minitab and use Stat > Basic Statistics > Two Proportions.
Based on the 95% CI for difference, we can be 95% confident that the actual risk of dying on Annapurna is higher—by about 24% to 40%—than the risk of dying on Everest. The p-value is less than 0.05, so the difference is statistically significant.
So if you’re a mountain climber who wants the ultimate death thrill, skip Everest and head right on up to Annapurna.
However, please be aware that we cannot guarantee the performance of Minitab Statistical Software at altitudes over 8,000 meters above sea level (the death zone, where there is not enough oxygen to sustain human life). But then again, it might not matter.
Coming attractions: We’ll use a Pareto chart to examine the primary causes of fatalities on Himalayan climbs over the years. Then put on your lederhosen, as we saunter up the Path of Steepest Ascent to reach the peak of process performance. And not to worry--we’ve never lost a user yet on that climb.