Recently, Patrick Runkel blogged about using regression models to explain how historians ranked the U.S. presidents. Given that I both love regression and that I’ve written about using regression to predict U.S. presidential elections, I wanted to take Patrick up on his challenge to improve upon his model.

My goal isn’t merely to predict the eventual ranking for any President. Instead, I’m much more interested in a fascinating question behind this analysis. Is the public’s contemporary assessment of the president consistent with the historical perspective, or do they differ?

With this in mind, I’ve collected two additional types of data that provide the contemporaneous assessment of the President and the social mood: presidential approval ratings and the Dow Jones Industrial Average.

Along the way, I’ll highlight the problems of overanalyzing small datasets, and how to determine if you are!

## Gallup Presidential Approval Ratings

The Gallup organization has tracked the approval rating of the president since the days of Franklin Roosevelt. As I’ve written about here, Gallup uses consistent wording in order to facilitate comparisons over time. I’ll use the fitted line plot for a preliminary investigation into whether this variable is worthy of consideration. I’ll run it three times to see how the historian’s ranking corresponds to the highest approval, average approval, and lowest approval.

Looking at the three plots, it’s interesting to note that the highest approval rating produces an R-squared of 0.7%! The fitted line is essentially flat. If you want an exemplar for what no relationship looks like, this is it!

The picture is more interesting in the average and low approval ratings plots. The low approval rating plot provides a better fit with an R-squared of 34.7%. Collectively, these plots suggest that it’s more important to know how *low *the approval has gone than how high! Even though there are only 12 data points, the approval rating is significant with a p-value of 0.044.

It seems that history remembers the worst of a President, rather the best!

Eleven data points follow the general trend. However, the one data point in the bottom-left of the plots is clearly an outlier. That data point is Harry Truman (pictured at top). Truman doesn’t fit the model because he had very low approval ratings while he was president, but the historians give him a fairly good rank of #6.

It’s tempting to remove this data point because the model then yields an R-squared of 67%. However, there is no reason to question that data point and I think it would be a mistake to remove it. It’s not good practice to remove data points simply to produce a better fitting model.

You may be wondering, can we add other variables into this model to improve it? Unfortunately, that’s not possible because of the limited amount of data available. In regression, a good rule of thumb is that you should have at least 10 data points per predictor. We’re right at the limit and can’t legitimately add more predictors.

Instead, let’s look at a new variable that provides more data points!

## Presidents and the Dow Jones Industrial Average

Previously, I assessed a model by Prechter, et al. that claims to predict whether an incumbent president would be re-elected using just the Dow Jones. The theory states that the stock market is a proxy variable for social mood, not that the stock market directly affects voting. The stock market is a good measure of social mood because if society feels positive enough to invest more money in the stock market, they are presumably happy with the status quo, which favors the incumbent.

The researchers find a positive, significant relationship between several outcomes for presidential elections that have an incumbent and the percentage change in the Dow Jones over a 3-year period. The researchers also include the traditional big three predictors of Presidential elections: economic growth, inflation, and unemployment. The study concludes that the three-year change in the DJIA is the best predictor. Further, when the DJIA predictor is included in the model, the other “Big Three” predictors become insignificant.

I concluded that their model was statistically valid and used it to accurately predict the outcome of the last election.

## Historian Rankings of U.S. Presidents and the Dow Jones

Because the Dow Jones Industrial Average is such an important predictor for re-election, can it also predict how well historians view past presidents?

I gathered the Dow Jones (DJ) data for the beginning and end of each president’s time in office, and calculated the percentage change. The Dow Jones began in 1896. For elections prior to 1896, I used the Foundation for the Study of Cycles data set, which I also used for my election prediction post. This data set uses market data from earlier indices to create a longer DJIA.

The initial exploration looks promising when I graph it in the fitted line plot.

You can see the overall negative slope. In the upper left corner the negative DJ changes are associated with worse ranks. In the bottom right, the higher DJ changes are associated with better ranks. The relationship appears to be curvilinear. This curvature makes sense because there is no limit on how much the Dow Jones can improve, but the rankings cannot be better than #1! Consequently, the downward slope has to flatten out as the DJ increases. We’ll incorporate the curvature in our regression models.

My approach will be to add in the Dow Jones data to both Nate Silver’s and Patrick Runkel’s model to see if it increases the explanatory power of either.

### Nate Silver’s Model

Silver’s original model (below) uses the percentage of the electoral vote a president receives for his second term to predict the historian's ranking.

As Patrick notes, it’s an elegant model because it requires only one easy to collect variable per president. The model yields an R-squared of 38.6%, which is nearly equal to the approval rating model. Silver’s model only applies to presidents who run for a second term. That gives us 29 data points, which is just enough to include the quadratic form of the Dow Jones data.

In the output, we can see that the Electoral College and Dow Jones predictors are all significant and the R-squared is 56.7%. The adjusted R-squared also increased from Silvers original model, which suggests that adding the additional predictors is valid. The coefficients are all as expected given the previous analyses.

Winning a higher percentage of the Electoral College and a positive Dow Jones both improve a president’s ranking by historians.

The Electoral College variable reflects the voter’s assessment of an incumbent president. The Dow Jones variable represents the social mood of the time, which has been shown influence elections. These two variables represent an entirely contemporaneous assessment of both the president and the times and together explain just over half the variability of the historian’s ranking.

Given the number of data points, it wouldn’t be wise to add more predictors to this model. So, we’ll move on to Patrick’s model.

### Patrick Runkel’s model

Patrick’s original model includes these variables: years in office, assassination attempt, and war. Collectively, these variables explain 56.66% of the variance. Let’s add in the Dow Jones data and see what we get.

All of the variables are significant and all three R-squared values have increased. This model accounts for 63.42% of the variance, or nearly two-thirds.

More years in office, a war, an assassination attempt, and a positive Dow Jones all improve a President’s ranking by historians.

With five predictors, the model is pushing these 41 data points to their limit. However, I think the model is good. The two main risks of including too many predictors in a model are:

- Insufficient power to obtain significance due to imprecise estimates.
- Overfitting the model, which is when the model starts to fit the random noise. The R-squared increases but, because you can’t predict the random noise for new data, the predicted R-squared decreases.

Fortunately, all of the predictors are significant, so power isn’t a problem. Further, the predicted R-squared has increased, so we probably aren’t overfitting the model.

## The Contemporaneous vs. the Historical Perspective

Is the historical perspective different from the contemporaneous perspective? How much can you divine from the present about the ultimate assessment by historians? These are very interesting questions. Our best model suggests that contemporaneous data account for two-thirds of the variance in the rankings by historians.

What about the other third? We can’t say for sure. It’s possible that if we could include more variables, or better variables, that contemporaneous data could account for even more of the variance. It’s also likely that the historical perspective does account for some of it. After all, history is complex and with hindsight, additional knowledge, etc., the perspective provided by time could revise the contemporary conclusions somewhat.

However, it’s quite clear that it’s easy to account for half the variance with a simple model that contains only two contemporaneous variables, and it's not too difficult to get up to two-thirds! This result reaffirms why I love statistics: You can observe and record the data around you and have a good assessment of reality that withstands the test of time. The historical perspective definitely has its place, but if you go find the right data and use the correct analyses, you can gain good insights *right now*!