# The Ascent to Everest: Exploratory Statistics

Earlier this month, thousands of trekkers were stranded by bad weather near Mount Everest and had to be evacuated. The news made me wonder: Just how many people are chillin' on Chololungma ( “Holy Mother”—the Tibetan name for Everest) these days?

So I decided to do some exploring of the statistical variety, using Minitab as my trusty Sherpa.

Exploratory analyses are a great way to reveal unexpected characteristics of your process. They just require an open mind, an ability to ask questions, and easy-to-use statistical software.

For example, a time series plot displays observations sequentially so you can quickly see process dynamics, such as how a process changes over specific time periods or under different conditions.

Using data on the number of ascents to the summit  from 1953 to 2010, I created a time series plot to visualize the trends (Graph > Time Series Plot > Simple in Minitab).

This plot looks almost like the slope of a mountain, doesn’t it? The increasing trend is pretty dramatic. In a few decades, you and I may be the only people in the world who haven’t climbed Everest. (Hey, there’s a way to get in the record books in the future—just stay in your chair.)

## Exploratory analysis: I wonder as I wander...

Climbing Everest is neither for the faint of heart nor the faint of pocketbook. Prices for a guided ascent generally range from about \$30,000 to \$75,000.

So could this sharp rising trend in ascents be associated with a rise in disposable income for high-income earners? Or with the economic rise of China? What about the increase in popularity in the Dalai Lama and Tibetan culture? Or variables of self-fulfillment and happiness measured in sociological studies during this time?

In an exploratory phase, you should satisfy your mind’s wanderlust. Keep yourself open to any potential patterns or associations you might want to explore.

My eye is also drawn to that sharp dip in 2008, which seems to bely the general trend. Is it just random variation? Or the result of some special cause?

A little sleuthing online reveals a possible explanation: In 2008, the northeast route to the peak, which is cheaper, was closed by the Chinese government for the entire climbing season, except for athletes carrying the Olympic torch for the 2008 Summer Olympics. That route was closed to foreigners once again in 2009 near the 50th anniversary of the Dalai Lama's exile.

## The Clustered Bar Chart: It's not about the destination, it's about scaring yourself silly.

If you’re an adrenaline junky looking for death thrills, is Everest really the Himalayan peak for you?

A clustered bar chart is a great exploratory analysis tool to compare data across groups. Using data on the number of climbers and the number of deaths for each Himalayan peak, I created a clustered bar chart in Minitab.

Tip: I flipped the horizontal and vertical scales to more easily compare the number of fatalities and the fatality rate for each Himalayan peak. That’s a handy option if you ever want to more easily compare across groups using responses with two different scales. (Choose Graph > Bar Chart – Cluster. Click Scale and check Transpose value and category scale.)

Based on this chart, which peak would you climb, if you had to choose one? (Let me know. I'll be at base camp Mount Nittany, carbo loading on gummi bears).

Notice how a count and a rate can provide contrasting results even when they’re based on the same data (see another example related to tracking process defects).

Using the bar chart of deaths on the left, you might conclude, “Everest is the most dangerous peak—it’s claimed over 3 times as many lives as any other Himalayan peak.” Using the bar chart of fatality rates on the right, you might say, “Everest is really one of the safest peaks—your chance of dying on it is lower than on the other frequently climbed Himalayan mountains.”

In this case, which do you think better represents the “true danger”?

## Two Proportions Test: Putting differences to the test

The fatality rate on Everest is about 6%. The fatality rate on Annapurna is near 40%. Suppose we assume that the individuals who’ve attempted to scale each peak are representative of mountain climbers in general. Is this difference statistically significant?

You don't need crampons, ropes, or an ice axe to find out: open Minitab and use Stat > Basic Statistics > Two Proportions.

Based on the 95% CI for difference, we can be 95% confident that the actual risk of dying on Annapurna is higher—by about 24% to 40%—than the risk of dying on Everest. The p-value is less than 0.05, so the difference is statistically significant.

So if you’re a mountain climber who wants the ultimate death thrill, skip Everest and head right on up to Annapurna.

However, please be aware that we cannot guarantee the performance of Minitab Statistical Software at altitudes over 8,000 meters above sea level (the death zone, where there is not enough oxygen to sustain human life). But then again, it might not matter.

Coming attractions: We’ll use a Pareto chart to examine the primary causes of fatalities on Himalayan climbs over the years. Then put on your lederhosen, as we saunter up the Path of Steepest Ascent to reach the peak of process performance. And not to worry--we’ve never lost a user yet on that climb.

Name: Omar Mora • Tuesday, November 22, 2011

Dear Patrick,
I´m not a climber (not of mountains), but I do find the sequence of your analysis really clear and useful.
I am Costa Rican, and consequently Central American. As far as I understand there are only two Central Americans (a man and a woman) in the list of climbers to the top of the Everest. Nowadays, a Costa Rican fellow, Warner Rojas is attempting to climb. He estiamated the cost per kilometer is \$5 (interesting data to add to your study, from my humble point of view). This is Warner´s website http://www.warnerrojas.com/
Any way, back to statistics: great analysis. I´ll wait for the Pareto.

Name: abhi • Tuesday, November 22, 2011

Excellent post!!! I like the way you make statistics so easy to understand by connecting it with the real world scenario.

Name: Sanjeev Garkhail • Wednesday, November 23, 2011

It would be dangerous (not as high as climmbing Everest though) to make such conclusions, e.g. probably only people who are fitter would try to go for Everest rest all would leave at Annapurna. For such an analysis a process flow diagram would be effective ;-)

Name: Patrick Runkel • Tuesday, November 29, 2011

Thanks for your kind comment, abhi. I'm glad you enjoyed the post and found the statistics easy to follow!

Name: Patrick Runkel • Tuesday, November 29, 2011

Great point, Sanjeev. You're absolutely right it's dangerous to draw conclusions from a sample that may not accurately represent a population. I try to avoid living dangerously, as you can tell from my post ( ; However I couldn't resist using this real data because I found it so interesting. To address the issue you raise I made this caveat before performing the hypothesis test:

"Suppose we assume that the individuals who’ve attempted to scale each peak are representative of mountain climbers in general."

Of course it's a gigantic assumption, much bigger than Mount Everest and Annapurna combined. Is it true? We don't know, but your point that only more fitter, more professional climbers would attempt Annapurna makes a lot of sense. And, if we assume that is true, the difference in fatality rates between Everest and Annapurn is actually underestimated by the fatality rates. So the overall conclusion would be the same, but even more dramatic.

Thanks again for your careful reading. (BTW a process flow diagram for climbing Everest would be very interesting!)

Name: Heidi • Sunday, December 4, 2011

Sheer brilliance, Patrick!

Name: gato joseph • Thursday, November 29, 2012

very wonderful