Editor's note: Bill Kahn runs the statistical modeling group for consumer banking at Bank of America. His team uses a broad range of statistical and machine learning techniques in building hundreds of models that help ensure financial stability for individuals and communities around the country. Over the past few decades, Bill has led statistics groups at several Fortune 500 financial service, consultancies and manufacturing firms. He has his BA in Physics and MA in Statistics from Berkeley and his PhD in Statistics from Yale. We asked Bill to share his insights about machine learning as a basis for action in the business world.
By Bill Kahn, SVP, Risk Modeling Executive - Bank of America
The promise of AI is that, through a machine learning algorithm, a machine can progressively learn to recognize patterns in data and make accurate predictions without any need for human intervention. For example, a machine learning algorithm used by a credit card company identifies a transaction likely to be fraudulent based on predictors such as the amount and location of the transaction or, a manufacturer predicts that a machine will soon need repair based on information coming in through various sensors.
But is it really true that humans are becoming obsolete in this new world of artificial intelligence?
First, to assess whether the machine has really “learned,” we need to test it by comparing an algorithm’s prediction to the right answer. Specifically, you would typically choose between competing algorithms and/or options by removing or holding out some of the sample data when fitting the model, then seeing how well the model predicts the values in the holdout data.
Nearly all machine learning algorithms examine the performance of a sequence of increasingly complex models on some random holdout. They select the model in the sequence that performs best. This conceptually straightforward process provides good prediction for the sample data.
While a machine is well-suited to automate a search through many possible methods, there will always be limits to what a machine can learn on its own. Context matters! The method must be appropriate for the situation and the data must be gathered in a thoughtful way.
The right method needs to be selected to avoid the ridiculous output of, say, classifying everyone into one group. Select a method with business, scientific, or engineering relevancy.
Also, every algorithm has options associated with it. You must explore a broad enough region of options to ensure that you are not in some awful corner. In other words, it takes a human to direct the machine into the right location.
The big jump, from your current data to an accurate prediction, requires the same considerations that apply to any statistical analysis. In particular ...
Statistical principles matter
Extension of your sample data to accurate prediction requires attention to crucial contextual issues. These include
- selecting the right problem
- selecting meaningful dependent variables
- calling out underlying data bias
- understanding relationships between variables
- building the right sequence of models.
None of these requirements are unique to machine learning, but all are essential for any statistical analysis to be trustworthy and all require human intervention.
Experimental design matters
The business wants to learn one thing: ‘What do we do differently given our new knowledge?’ Naïve use of machine learning models look at all possible combinations of predictor variables and then pick the combination that provides the best prediction for the holdout data. That approach takes maximum advantage of what we know, but it often results in models that are difficult to interpret, making future incremental learning impossible.
For example, it is useful to predict that certain process conditions will likely lead to a defect, but it is even more important to understand why those conditions lead to defects so that you can prevent those defects from occurring in the future.
While the hype of total AI will likely remain, reality is starting to set in. Machines will continue to “learn” and predictions will become more and more accurate, but the context that comes from a practitioner’s understanding of the problem is still vital to sending those machines off in the right direction to solve the right problem.
Bill Kahn will deliver a keynote presentation, Machine Learning: What It Is, How to Not Mess Up, and How to Generate the Value, at the Minitab Insights 2018 conference in September. Sign up to save your seat.