Guest Post: It’s Tough to Make Predictions, Especially about the Future (even with Machine Learning)

Minitab Guest Blogger | 9/4/2018

Topics: Banking and Finance, Design of Experiments - DOE, Machine Learning, Predictive Analytics, Data Analysis

Bill Kahn, SVP, Risk Modeling Executive - Bank of America

Bill Kahn runs the statistical modeling group for consumer banking at Bank of America. His team builds hundreds of models using a broad range of statistical and machine learning techniques. These models help ensure financial stability for individuals, enterprises, and communities throughout the country. Over the past few decades, Bill has led statistics groups at several Fortune 500 financial service, consulting, and manufacturing firms. He has his BA in Physics and MA in Statistics from Berkeley and his PhD in Statistics from Yale. 

Minitab asked Bill to share his insights about machine learning (ML) as “a basis for action” in the business world.

Machine Learning (ML) Algorithms

At its core, all ML algorithms follow the same two-part process. First, some sequence of increasingly complex functions is fit to part of the data (the training data set). Then, each model in the sequence is evaluated on how well it performs on the data that was held out (the holdout set). The model with the best fit on the holdout set is selected. There are rich variations in these steps — including the sequence explored, how each fit is found, the definition of a good fit, and what holdout set randomization is selected. And it turns out, with a couple of modest cautions, this simple sequence typically produces good within sample predictions.

The Two Modest Cautions

First, we need to use the right loss function to fit and evaluate the models. If the loss function is not properly specified all machine learning algorithms can produce silly output—like classifying everyone into the predominant group, and consequently failing to make any useful predictions. We need to use our experience to select a loss function with business, scientific, or engineering relevancy.

Second, as every algorithm has hyperparameters (parameters that cannot be established on purely conceptual grounds), we must explore a broad enough range for these to ensure that we are not using some awful set of values, which would lead to unacceptably weak predictions.

Predicting the Future

However, while machine learning produces good within sample predictions, that is not what we need. We need good out-of-sample predictions. This jump, from past experience to future behavior, is a big one and requires additional considerations driven by core statistical principles. These considerations include: selecting the right problem, selecting meaningful dependent variables, calling out underlying data bias, understanding hierarchy and dependence among observations, and building the right sequence of models. None of these requirements are unique to ML—all are essential for any statistical analysis to be trustworthy.

And finally, the best that any model can do is extract the information that is actually in the data. To have value there must be information in the data in the first place. To ensure our data has valuable information in it experimental design matters in ML models just as it matters for any other predictive model. Once one has run a well-structured design you can now build a ML model and score every observation for every possible combination of controllable variables. We learn the best way to set every controllable input (say, price point or marketing channel or temperature or speed) to produce the best outcome for every input.

This approach takes maximum advantage of what we know but it has a significant disadvantage for any system whose true state evolves. The best setting of the control factors themselves drift when the exogenous environment drifts (such as evolving raw material quality, consumer trade-offs, or competitor response). If we always make a single optimal assignment for any particular input then we will confound what we observe with what we do. This makes building a new and improved model impossible. Newer experimental designs, such as Thompson sampling, solve this problem by continually challenging our current beliefs. These designs enable us to strike an optimal balance between making money now and learning what we need to be able to make money in the future as well.

ML is a powerful addition to the professional’s toolkit. With some basic caution, enabling us to avoid the silly and the awful, and in combination with the full set of classical statistical skills, ML helps make us ever-better statisticians.