Graph Quest: How to Show that Life on Venus Is Safer than Life on Mars

spaceTrue confession: Nothing fires quickly from the top of my head. At least nothing very lucid or useful.

To come up with a good idea, I have to dredge thoughts slowly from the thick sludge and sediment in my brain. 

It's not always easy—there are deeply encrusted layers in my cerebral cortex that go all the way back to the Paleozoic era.

So coming up with a useful data display—one that uncovers hidden patterns or elucidates interesting relationships—often takes a bit of doing for me.

It's rare that I can nail it on the first shot.

Finding the Graph That's Worth a Thousand Words

After examining mortality data for U.S. males from a period life table, I thought it might be interesting to compare mortality risk between U.S. men and women by age.

First, I entered the data in a Minitab worksheet:

raw data

That raw data is sooo deathly dull, is it not?

My first thought? Display a time series plot to better compare and visualize the differences in mortality risk between males and females at each age. (In Minitab Statistical Software, choose  Graph > Time Series Plot, and choose Multiple. In Series, enter Male and Female columns. Click Time/Scale. Select Stamp, and enter Age as the stamp column.)

time series

That graph is about exciting as waiting in line at the post office. Doesn’t illuminate much. Women seem to have a slightly lower risk of mortality at higher ages (yawn).

Had I stopped here, I might have concluded that this data doesn’t reveal anything. But I never trust my first impressions-—or my first graphs.

My next thought was to calculate the pairwise differences in mortality risk at each age. (Right-click the column and choose Formuas > Assign Formulas to Column, then define the formula: Difference =  Males - Females).


Would plotting those pairwise differences improve the visual comparison of risk?

time series plot of difference

That's a bit better. You can more clearly see at what age the difference in mortality risk between men and women increases and peaks.

But it's difficult to draw intuitive conclusions from this graph.

After all, what does it mean to say that the risk of mortality is greater for men than women at age 98 by approximately 0.05?  That's such a small difference--does it mean anything?

I forced my tired, stubborn, old mule-of-a-brain to take one more step up the hill.

To more intuitively compare the mortality risk, I decided to divide the mortality risk of men by the mortality risk of women at each age. That gives the relative risk of death at each age for men compared to women.  

relative risk worksheet

Here are the results displayed on a Minitab scatterplot (with a reference line added at 1 to show where the risk of mortality is the same for men vs. women).

relative risk

This graph, derived from the same data as the other two graphs, reveals much more:

  • American males are more likely to die than American females at every age—except at ages 10 and 11.
  • In their early 20s, men are more than 3 times likely to die than females.
  • After peaking at age 22, the relative risk steadily decreases, until rising again between ages 45 and 55.
  • The risk of mortality for men and women is approximately the same (the relative risk reaches 1) after age 115.

Each pattern raises questions that could lead to further follow-up study. That's what a good exploratory graph does.

So don’t let an initially ho-hum graph make you conclude there aren’t some interesting patterns lurking in your data. 

Sometimes you have to explore  and experiment to discover them.

7 Deadly Statistical Sins Even the Experts Make

Do you know how to avoid them?

Get the facts >


Name: ian • Monday, August 5, 2013

Very engaging.

blog comments powered by Disqus