If the title of this post made you think you’d be reading about Abraham Lincoln and Tyra Banks, you’re only half right.
A few weeks ago, statistician and journalist Nate Silver published an interesting post on how U.S. presidents are ranked by historians. Silver showed that the percentage of electoral votes that a U.S. president receives in his 2nd term election serves as a rough predictor of his average ranking of greatness.
Here’s the model he came up with, which I’ve duplicated in Minitab using the scatterplot with regression and groups (Graph > Scatterplot ):
Silver divided the data into two groups to emphasize the marked difference in historical rankings between presidents who receive less than 50% or greater than 50% of the electoral vote in their second-term election. But as you can see from the slope and position of both lines, the linear model for both groups is almost identical.
How will President Obama be ranked? The model predicts that he’ll be historically ranked about 18th among the 43 persons who’ve served as U.S. presidents thus far.
Silver cautioned that this model provides only “rough” estimates. But he didn’t provide details. That made me curious (or skeptical)—how rough is it?
To find out, I analyzed the model (without groups) using Minitab’s General Regression:
-------------------------------------------------------------------------------
Silver's Model: General Regression
Summary of Model
S = 8.48729 R-Sq = 38.59% R-Sq(adj) = 36.32% R-Sq(pred) = 30.25%
Analysis of Variance
Source DF Seq SS Adj SS Adj MS F P
Regression 1 1222.32 1222.32 1222.32 16.97 0.000
2nd Term % Electoral College 1 1222.32 1222.32 1222.32 16.97 0.000
Error 27 1944.92 1944.92 72.03
Lack-of-Fit 25 1882.92 1882.92 75.32 2.43 0.333
Pure Error 2 62.00 62.00 31.00
Total 28 3167.24
Regression Equation
Historians rank = 31.0708 - 0.221189 2nd Term % Electoral College
-------------------------------------------------------
The p-value (0.000) indicates that the % of the electoral vote a president receives for his second term is indeed a statistically significant predictor of his historical ranking at an alpha level of 0.05.
To get an idea of how “rough” the model is, look at the R-squared values. The R-sq value of 38.59% indicates that this simple model explains about 40% of the variability in a president’s average historical ranking. But for predicting the average rankings for future presidents the model is a bit rougher—it explains only about 30% of the variability in future observations (R-sq (pred) = 30.25%).
In either case, that leaves quite a lot of variation unspoken for. Using the information readily available on U.S. presidents online, is it possible to come up with a better predictive model?
Do Great Presidents Make History? Or Vice-versa?
If life offers us anything certain at all (besides death and taxes), it’s the unbridled opportunity for tentative speculation.
Would Lincoln be considered such a great president if he hadn’t governed during the violent and tumultuous times of the U.S. Civil War? Or FDR, if he hadn’t led our country through the dark days of World War II?
In other words, are historians more likely to rank a president higher if he governed during a war?
What about other factors? Would JFK be so admired if his presidency hadn’t ended so abruptly and tragically in assassination? Could even a superficial thing as a president’s physical stature affect his historical ranking and public popularity? What about his age? Or lifespan?
(One side note here. In statistics, you’re often cautioned that correlation does not imply causation. While that’s true, don’t interpret that oft-cited warning to mean that correlation and causation aren’t related. Of course they are. In fact, when you’re thinking about possible predictors for your model, you’re going to naturally think of possible causative factors. Because if there is a causative relationship between a predictor and a response variable, there should be a significant association. Correlation itself doesn’t prove the causation though—for that you need to rely on other types of analyses.)
Choosing a Top Model
Take a look at two models to predict the historical ranking of a U.S. president, both analyzed using Minitab’s General Regression.
Model 1 uses age and length of retirement as predictors, as well as a categorical predictor to indicate whether the U.S. was at war during the president’s tenure in office:
--------------------------------------------------------------------------------------
Model 1: Age (inauguration), Age (death), Length of Retirement, War
Summary of Model
S = 8.81231 R-Sq = 59.36% R-Sq(adj) = 54.43% R-Sq(pred) = 44.81%
Analysis of Variance
Source DF Seq SS Adj SS Adj MS F P
Regression 4 3743.14 3743.14 935.79 12.05 0.000
Age at inauguration 1 24.09 1140.85 1140.85 14.69 0.001
Age at death 1 29.23 1204.39 1204.39 15.51 0.000
Length of Retirement 1 2710.77 1077.89 1077.89 13.88 0.001
War 1 979.05 979.05 979.05 12.61 0.001
Error 33 2562.67 2562.67 77.66
Total 37 6305.82
Regression Equation
War
No Historians rank = 30.7955 + 2.29991 Age at inauguration - 2.20691 Age at death + 0.00623944 Length of Retirement
Yes Historians rank = 17.8469 + 2.29991 Age at inauguration - 2.20691 Age at death + 0.00623944 Length of Retirement
-------------------------------------------------------
Model 2 also includes War as a categorical predictor. But its continuous predictor simply indicates the number of years the president served in office. A second categorical predictor indicates whether the president was subject to an assassination attempt.
----------------------------------------------------------------------------------
Model 2: Years in Office, Assassination Attempt, War
Summary of Model
S = 8.67388 R-Sq = 56.66% R-Sq(adj) = 53.24% R-Sq(pred) = 46.82%
Analysis of Variance
Source DF Seq SS Adj SS Adj MS F P
Regression 3 3737.43 3737.43 1245.81 16.56 0.000
Years in Office 1 2458.05 1068.64 1068.64 14.20 0.001
Assassination Attempt 1 739.43 696.96 696.96 9.26 0.004
War 1 539.96 539.96 539.96 7.18 0.011
Error 38 2858.97 2858.97 75.24
Lack-of-Fit 16 992.14 992.14 62.01 0.73 0.74
Pure Error 22 1866.83 1866.83 84.86
Total 41 6596.40
Regression Equation
Assassination War
Attempt
No No Historians rank = 38.9471 - 2.21938 Years in Office
No Yes Historians rank = 30.3752 - 2.21938 Years in Office
Yes No Historians rank = 29.6362 - 2.21938 Years in Office
Yes Yes Historians rank = 21.0644 - 2.21938 Years in Office
--------------------------------------------------------
Of these two models, which would you favor? Compare the output and look carefully at the predictors in each model.
Both models are statistically significant—in fact, all the predictors in each model have p-values less than an alpha of 0.05. Both models also have about the same R-squared value, and explain close to 60% of the variation in the historical rankings. You might favor Model 1 because its R-squared values are a wee bit higher. You also might prefer its ease of use, with only 2 regression equations to predict the response, instead of 4. Those are valid reasons, but there’s something lurking beneath the surface that trumps those reasons—a dreaded “statistical disease” that can afflict regression models, called multicollinearity.
What makes models behave erratically?
Multicollinearity is a word you’re not likely to hear bandied about in your local sports bar. But despite it's big scary name, multicollinearity is a relatively simple concept. It just means that your model contains predictors that are correlated with each other.
Take a look at Model 1 again, with its continuous predictors Age at inauguration, Age at death, Length of retirement. Any likely correlations between those predictors? Obvious, isn’t it, once you think about it? The longer you live, the longer your retirement is likely to be.
To evaluate possible multicollinearity using our statistical software, display the predictors you suspect of correlation on a scatterplot (Graph > Scatterplot) and run a correlation analysis (Stat > Basic Statistics > Correlation):
Correlations: Length of Retirement, Age at death
Pearson correlation of Length of Retirement and Age at death = 0.834
P-Value = 0.000
----------------------------------------
The graph indicates a clear relationship between the two predictors, as you’d expect. The correlation analysis shows that it’s a fairly strong, statistically significant correlation.
But what if a correlation between your predictors isn’t so intuitively obvious? Or what if your model contains lots of predictors? There’s another way to quickly spot the trouble.
When you run the regression analysis in Minitab, click Options (in Regression) or Results (in General Regression) and choose to display the variance inflation factors (VIFs). This is what you’d get for the two models:
Model 1Term VIF |
Model 2Term VIF |
Now you can see a big difference between the models. High variance inflation factors indicate possible correlations between the predictors. You want VIFs as close to 1 as possible. Anything greater than 10 indicates trouble. As you can see, Model 1 shows strong evidence of multicollinearity.
So what’s the big deal? Is multicollinearity just another complicated rule that statisticians can use to gleefully tear apart your results?
No. The big deal is that multicollinearity makes models very unstable. Weird things can happen...estimates for the coefficients for each predictor can vary erratically depending on which other predictors you include in the model. What’s more, predictors that appear to be statistically significant may not be significant at all.
For example, look what happens after you remove one of the correlated predictors from Model 1:
Model 1 (without Length of Retirement)
Term P VIF
Age at inauguration 0.31 1.59
Age at death 0.28 1.60
War (No) 0.00 1.01
----------------------------------------
The VIF values are now much lower—because removing one of the correlated factors addressed the problem of multicollinearity. But look what happened to the p-values. Before, both continuous predictors, "Age at inauguration" and "Age at death," were statistically significant, with p-values < 0.05. Now neither predictor is significant—together or by itself. Model 1 falls to pieces…crumbles to dust…a victim of instability caused by multicollinearity.
On the other hand, if you run Model 2 with each predictor by itself, each predictor is always statistically significant. It’s a stable model.
Conclusion: The Makings of a Good Model
As you can see, there's much more to a good model than low p-values. I set out to find a better model to account for average historical rankings of U.S. presidents. Model 1 self-destructed due to multicollinearity. But Model 2 is stable, and it does have some advantages over Silver's original model:
- It accounts for more variation in the response.
- It can be used to estimate historical rankings for all U.S. presidents, regardless of whether they sought a second term.
I also think the predictors I've chosen are a bit more thought-provoking (could a Wag the Dog principle be at work?)
But I admit my model is not as simple and elegant as Silver's, with its one continuous predictor that can be easily and accurately measured before a president's second term is completed. My model is definitely clunkier.
It also has some potential issues related to the measurement of the categorical predictors, such as how to define an assassination attempt. For example, someone once fired shots at the White House from afar while President Clinton was in office, but I didn’t count that as an assassination attempt—partly because I didn’t think it was part of the public consciousness. Similarly, for the "War" variable, I didn't count conflicts with Native American tribes as a U.S. war during a president's term. Clearly, there's lots of room for debate there.
There’s so much data on U.S. presidents available. If only I had world enough and time. There has to be a better model. Maybe you can find it using the data in this Minitab project.