Trimming Decision Trees to Make Paper: Predictive Analytics and Root Cause Analysis in Minitab

As we collect more and more observational data from our processes, we might need new tools to provide meaningful insights. You can add modern-day machine learning techniques alongside traditional statistical tools to analyze, improve and control your processes. Let's take a look at an example that starts with binary logistic regression and ends with Classification and Regression Trees (CART®).

Editor's note: An earlier version of this post showing CART in Salford Predictive Modeler was published in March 2018. We have updated it to show CART in the latest version of Minitab.

Finding the Root Cause of Excessive Variation in a
Pulp Bleaching Process

In our example, we know that we are seeing defects in 2.9% of our product. To begin looking at the root cause of this unacceptable percentage of defects in this process, you might begin with a Binary Logistic Regression in Minitab where the response variable is whether an observation was defective or not. Unfortunately, for this data, the crazy patterns in the residual plots below indicate that the binary logistic regression model might not be adequate.

blog-trimming-decision-trees-2-deviance-residual-plots-pulp-defects

The CART Approach

CART is a decision tree algorithm that works by creating a set of yes/no rules that split the response (Y) variable into partitions based on the predictor (X) settings. Using the CART feature in Minitab, I see that one of my predictor variables – Discharge pH – is a large contributor to a defect.

blog-trimming-decision-trees-3-zoom-cart

If discharge pH <= 7.739, then the estimated probability of a defect is relatively high (17.7%). If discharge pH > 7.739, then very few defects occur.