Predictive Analytics and Determining Patient Length of Stay at Time of Admission

José Padilla 20 May, 2020

Topics: Regression Analysis, Healthcare, Predictive Analytics, Minitab Statistical Software, regression trees, CART

Outside a hospitalLength of stay, defined as the time between hospital admission and discharge measured in days, is an aspect of care that can be costly for most healthcare systems if not approached properly. Optimizing patient flow, on the other hand, facilitates beneficial treatment, minimal waiting, minimal exposure to risks associated with hospitalization, and efficient use of resources such as hospital beds, medical equipment and available clinical staff.

Bringing Historic Hospital Data and Machine Learning Together to Optimize Patient Flow and Resource Planning

When information from the Centers for Medicare & Medicaid Services shows one-third of all healthcare expenditures in the United States can be attributed to inpatient care, maintaining as much control over hospital patient length of stay is critical. It gets complicated, though. Patient age, sex, medical history and several other factors all have varying levels of influence on those days in between hospital admission and discharge.

Thankfully, predictive analytics tools like those available in Minitab can use large amounts of available data to predict individual outcomes for patients. In the following example we will examine one healthcare facility’s initiative to optimize patient length of stay.


Stay Informed on Healthcare Solutions

Yes! Send Me News on Data Analysis and Process Improvement


Example: Hospital Uses Predictive Analytics to Anticipate How Long Patients Will Stay As Soon as They Arrive

Let's say a mid-sized hospital in Oregon is setting a goal to plan and use their resources better. Their operational excellence team has a data set containing information on approximately 8,500 patients that visited the hospital in the past two years. It includes 21 predictors or variables of interest ranging from general information like age, sex and marriage status to medical information like pain level, tumor size and white and red blood cell counts. Here is their worksheet in Minitab:

Minitab worksheet of 20 predictors related to patient length of stay

Notice that the worksheet has 22 columns of data. The first 21 columns represent the predictors or variables they will use to predict patient length of stay, while column 22 represents length of stay.  


Analyzing the Length of Stay Data Using a Regression Tree

A machine learning algorithm “teaches” a computer to recognize patterns using available data.  Minitab’s predictive analytic tools include Classification and Regression Trees (CART®). Regression Trees are a decision tree algorithm that work by creating a set of yes/no rules that split the data into partitions based on the predictor settings that best separate the data into similar response values. By using this tool, they will be able to:

  1. Identify the most important variables that affect length of stay.
  2. Discover combinations of predictor settings that are most likely to lead to a lower or higher average length of stay.
  3. Visualize their findings.
  4. Create business rules that are easy to understand, use and apply to their process in real-time.

To create a Regression Tree, a member of the operational excellence team would click Stat > Predictive Analytics > CART® Regression...

Here is the completed dialog box.

CART Regression dialog box in Minitab

Minitab displays a Tree Diagram in the output pane, as shown below. It has two different shapes called nodes. Notice that some of the nodes are split into other nodes and other nodes do not split any further. The nodes that not split are called terminal nodes. Each terminal node in the regression tree represents a specific combination of predictor settings. The number of terminal nodes represents the size of the tree. In our example, the tree provided by Minitab has 10 terminal nodes. So, the tree size is 10.


The output also displays the Relative Variable Importance graph below. This graph ranks the percent relative importance of each predictor variable at explaining variability in patient length of stay. In our example, notice that Age is the most important variable when predicting length of stay. Cancer stage, marital status, smoking history, number of tumors and white blood cell count also predict length of stay.

Relative Variable Importance graph from CART Regression on patient length of stay


Using the Model to Predict Patient Length of Stay

It is easy to make predictions with this model using the Predict… option in Minitab. Here we predict for a new case:

Predict dialog box in Minitab


And the results are shown below:

Output from Minitab prediction

Notice under settings that the output provides the values entered for each predictor variable. Just below settings and under prediction, Minitab provides the Fit value, which in this case is the predicted average length of stay. With that information, the hospital can predict that:

A 53-year-old married man

    • with Stage II cancer
    • who has never smoked,
    • reported a pain level of 4 when he arrived,
    • and matches the other information above ...

... is predicted to stay in the hospital for 5.43 days.


The OpEx Team Can Now Predict Patient Length of Stay Better

With help from CART Regression in Minitab, the hospital's operational excellence team has the data they need to accurately predict how long a patient will stay based on information they know when that patient arrives. When they know how long on average patients with different conditions will be staying at the hospital, they can adjust their plans to ensure they have adequate resources when they are needed.


Interested in a demonstration or discussing more applications of solutions analytics in healthcare?

Talk to Minitab


In times of uncertainty, Minitab is here to help.
Better. Faster. Easier. Now with CART.