# A Six Sigma Healthcare Project, part 2: Visualizing the Impact of Individual Factors

My previous post covered the initial phases of a project to attract and retain more patients in a cardiac rehabilitation program, as described in a 2011 *Quality Engineering* article. A Pareto chart of the reasons enrolled patients left the program indicated that the hospital could do little to encourage participants to attend a greater number of sessions, so the team focused on increasing initial enrollment from 32 to 36 patients per month.

Stakeholders offered several solutions. Before implementing any improvement strategy, however, the team decided to look at how other individual factors influenced patient participation in the program. Taking this step can help avoid devoting resources to "fixing" factors that have little impact on the outcome.

In this post, we will look at how the team analyzed those individual factors. We have (simulated) data from 500 patients, including:

- Address and distance between each patient's home and hospital
- Each patient's age and gender
- Whether or not the patient had a car
- Whether or not the patient participated in the program

Download the data set to follow along and try these analyses yourself. If you don't already have Minitab, you can download and use our statistical software free for 30 days.

The team used simple statistics and graphs to get some preliminary insight into how these different factors affected whether or not patients decided to participate in the rehabilitation program.

## Looking at the Influence of Distance on Patient Participation

The team looked first at the influence of distance on participation using a boxplot. Also known as a box-and-whisker diagram, the boxplot gives you an indication of your data's general shape, central tendency, and variability with a single glance. Displaying boxplots side-by-side lets you easily compare the distribution of data between groups. You can easily compare the central value and spread of the distribution for each group and determine if the data for each group are symmetric about the center.

To create this graph, open the patient data set in Minitab and select **Graph > Boxplot > One Y With Groups**.

In the dialog box, select "Distance" as the graph variable, choose "Participation" as the categorical variable, and click **OK**.

Minitab generates the following graph:

The boxplot indicates that patients who live closer to the hospital are more likely to participate in the program. This is valuable, but it would be interesting to know more about the relationship between distance and participation. But because "Participation" is a binary response—a patient either participates, or does not—we can't visualize that relationship directly with graphs that require a continuous response.

However, to get a bit more insight, the project team divided the patients into groups according to how far away from the hospital they live, then calculated the relative percentage of participation for each group. To do this, select **Data > Recode > To Text...** and complete the dialog box using the following groups. The picture below shows only the first five of the seven groups, so here is the complete list:

Group 1: 0 to 25 km

Group 2: 25 to 35 km

Group 3: 35 to 45 km

Group 4: 45 to 55 km

Group 5: 55 to 65 km

Group 6: 65 to 75 km

Group 7: 75 to 200 km

When you recode the data, Minitab creates new columns of coded data and provides a summary in the Session Window:

Minitab automatically names the new column of data "Recoded Distance," which I've renamed as "Distance Group."

To determine the relative frequency of participation among each group, choose **Stat > Tables > Descriptive Statistics...** In the dialog box, select 'Distance Group' as the variable for rows, and Participation as the variable for columns, as shown. Click on the "Categorical Variables" button and make sure 'Counts' and 'Row percents' are selected, then press **OK **twice.

In the session window, Minitab will display a table that shows the total number in each distance group, the number participating, and the relative frequency of participation for each group.

If we enter that information into the Minitab worksheet like this:

we can create a scatterplot that reveals more about the relationship between distance and participation. Select **Graph > Scatterplot...**, and choose "With connect line."

Select 'Part %' as the Y variable and 'Distance Grp' as the X variable, and Minitab creates the following graph, which shows the relationship between distance and participation more clearly:

We can see that the percentage of participation is very high among patients who live closest to the hospital, but decreases steadily among groups who lived further than 45 miles away.

## Looking at the Influence of Age on Patient Participation

We can use the same methods to get initial insight into how age affects a patient's likelihood of participation in the program. The boxplot below indicates age does have some influence on participation:

By dividing the patient data into groups based on Age as we did for Distance, as detailed in the table below, we can create a similar rough scatterplot to enhance our understanding of the relationship between these variables. We’ll divide the data as shown here before using **Stat > Tables > Descriptive Statistics…** to determine the relative participation rates:

The scatterplot of the relative frequency of participation for patients in each Age group again yields greater insight into the relationship between this factor and the likelihood of participation. In this case, a much higher percentage of patients in the younger groups take part.

## Looking at the Influence of Mobility and Gender on Patient Participation

Because both "Mobility" and "Participation" are binary variables, we can select **Stat > Tables > Descriptive Statistics...** to give us a tabular view of the data. Select "Mobility" as the row, and Participation as the columns, and Minitab will provide the following output, which gives you percentages of participation among those patients who do not own a car and those who do.

We can put these data into a bar chart for a quick visual assessment. Minitab offers several ways to accomplish this easily; I opted to place the table data for each variable into the worksheet as shown here:

Now, by selecting **Graph > Bar Chart**, and choosing a simple chart in which "Bars represent values from a table"...

we can create the following bar charts that show the proportion of those with and without cars who participate in the program, and the proportion of men and women who participate:

It appears that gender could have a slight influence on participation, but the impact of having a car on participation is clearly an important factor.

An initial look at these factors indicates that access to the hospital is very important in getting people to participate. Offering a bus or shuttle service for people who do not have cars might be a good way to increase participation, but only if such service doesn't cost more than the amount of increased revenue it might generate by increasing participation.

In the next part of this series, we'll use binary logistic regression—which is not as scary as it might sound—to develop a model that will let us predict the probability a patient will join the program based on the influence factors we've looked at. A good estimate of that probability will enable us to calculate the break-even point for such a service.