How to Use Brushing to Investigate Outliers on a Graph

Cody Steele | 10 July, 2014

Topics: Statistics in the News

There’s a lot going on in the world, so you might not have noticed that the Organization for Economic Development (OECD) released their new set of health statistics for member nations. On the OECD website, you can now download the free data series for 2014. (Be aware that “for 2014” means that the organization has a pretty good idea about what happened in 2012.)

Of course, there’s nothing more fun than sharpening your Minitab skills with real data. Each time the OECD releases their data, we hear about how much money is spent per person on health care compared to how long people live in that nation. (For example, 2013, 2012, and 2011) We also tend to be treated to graphics similar to the scatterplot below, typically with clever variations for the symbols:

Life expectancy generally increases with per capita health expenditures.

In this display, the point on the far right appears to be an outlier, spending more per capita than other nations but not fitting the general trend of increasing life expectancy as spending increases. When you see an apparent outlier, you want to investigate.

Investigating Outliers with Brushing

Brushing is a feature in Minitab that makes it easy to investigate outliers in graphs. For example, from the scatterplot in Minitab, you can fit simple regression models with and without the suspected outlier to see how great the influence is on the model. You can copy and paste the data at the end of the post if you want to follow along.

Here’s how to do a visual sensitivity analysis of data on a scatterplot with brushing. Start by making the graph:

  1. Choose Graph > Scatterplot.
  2. From the gallery of scatterplots, select Simple. Click OK.
  3. In Y variables, enter ‘Life Expectancy’.
  4. In X variables, enter ‘Per Capita Health Expenditures’. Click OK.

Turn on brushing mode and set an ID variable to get specific information about the outlier.  With the scatterplot window active, follow these steps:

  1. Choose Editor > Brush.
  2. Choose Editor > Set ID Variables.
  3. In Variables, enter Country 'Life expectancy' 'Per capita health expenditures'. Click OK.
  4. On the graph, click and drag to cover the unusual point.

In the brushing window, you can see that the unusual point is the United States, row 34 in the data set. You can also see the specific values of life expectancy and per capita health expenditures.

In the United States, average life expectancy is 78.7 years and per capita health expenditures are $8,745.

To do the regression with and without the outlier, use brushing to create an indicator variable:

  1. With the brushing window still showing, choose Editor > Create Indicator Variable.
  2. In Column, enter United States.
  3. Select Update now. Click OK.
  4. Choose Editor > Select.
  5. Choose Editor > Select Item > Symbols.
  6. Choose Editor > Edit Symbols.
  7. Select the Groups tab.
  8. In Categorical variables for grouping, enter ‘United States’. Click OK.

The United States is indicated by the red square.

With groups on the graph, you can do separate regression fits for the groups.

  1. Choose Editor > Add > Regression Fit.
  2. Select Quadratic.
  3. Check Apply same groups of current displays to regression fit. Click OK.
  4. Choose Editor > Add > Regression Fit.
  5. Select Quadratic. Click OK.

The red curve, with the United States, slopes downwards more slowly than the blue curve.

On the graph, the red curve shows that when you include the United States in the data, the decrease in life expectancy is slower than when you exclude the United States.  Bonus tip:  Hover over each curve in Minitab to see the regression equation used to create the curve.

Using Brushing in Other Types of Graphs

Brushing is a great tool in Minitab for investigating specific points on a graph, but it doesn’t just work on scatterplots. If you’re ready for more, you can see the complete list of graphs you can brush and how to make a graph that excludes brushed points. Prefer an example? Check out how Patrick Runkel uses brushing to study the relationships on a bubble plot!

Here’s the data for the per capita health care expenditures in United States dollars and the life expectancy for the total population at birth.

Per Capita Health Expenditures

Life Expectancy

Country

3997

82.1

Australia

4896

81.0

Austria

4419

80.5

Belgium

4602

81.5

Canada

1577

78.9

Chile

2077

78.2

Czech Republic

4698

80.1

Denmark

1447

76.5

Estonia

3559

80.7

Finland

4288

82.1

France

4811

81.0

Germany

2409

80.7

Greece

1803

75.2

Hungary

3536

83.0

Iceland

3890

81.0

Ireland

2304

81.8

Israel

3209

82.3

Italy

3649

83.2

Japan

2291

81.3

Korea

4578

81.5

Luxembourg

1048

74.4

Mexico

5099

81.2

Netherlands

3172

81.5

New Zealand

6140

81.5

Norway

1540

76.9

Poland

2457

80.5

Portugal

2105

76.2

Slovak Republic

2667

80.2

Slovenia

2998

82.5

Spain

4106

81.8

Sweden

6080

82.8

Switzerland

984

74.6

Turkey

3289

81.0

United Kingdom

8745

78.7

United States