Exploring Healthcare Data, Part 1

Meredith Griffith | 5/2/2016

Topics: Six Sigma, Healthcare, Data Analysis

Working with healthcare-related data often feels different than working with manufacturing data. After all, the common thread among healthcare quality improvement professionals is the motivation to preserve and improve the lives of patients. Whether collecting data on the number of patient falls, patient length-of-stay, bed unavailability, wait times, hospital acquired-infections, or readmissions, human lives are stake. And so collecting and analyzing data—and trusting your results—in a healthcare setting feels even more critical.

Because delivering quality care efficiently is of utmost importance in the healthcare industry, understanding your process, collecting data around that process, and knowing what analysis to perform is key. Awareness about your process and opportunities to improve patient care and cut costs will benefit from using data to drive decisions in your organization that will result in better business and better care.

So, in the interest of using data to draw insights and make decisions that have positive impacts, I’d like to offer several tips for exploring and visualizing your healthcare data in a way that will prepare you for a formal analysis. For instance, graphing your data and examining descriptive statistics such as means and medians can tell you a lot about how your data are distributed and can help you visualize relationships between variables. These preliminary explorations can also reveal unusual observations in your data that should be investigated before you perform a more sophisticated statistical analysis, allowing you to take action quickly when a process, outcome, or adverse event needs attention.

In the first part of this series, I’ll offer two tips on exploring and visualizing data with graphs, brushing, and conditional formatting. In part 2, I’ll offer three more tips focusing on data manipulation and obtaining descriptive statistics.

If you’d like to follow along, you can download and explore the data yourself! If you don’t yet have Minitab Statistical Software, you can download the free, 30-day trial.

A Case Study: Ensuring Sound Sanitization Procedures

Let’s look at a case study where a hospital was seeking to examine—and ultimately improve—their room cleaning procedures.

The presence of adenosine triphosphate (ATP) on a surface indicates that bacteria exists. Hospitals can use ATP detection systems to ensure the effectiveness of their sanitization efforts and identify improvement opportunities.

Staff at your hospital used ATP swab tests to test 8 surfaces in 10 different hospital rooms across 5 departments, and recorded the results in a data sheet. ATP measurements below 400 units ‘pass’ the swab test, while measurements greater than or equal to 400 units ‘fail’ the swab test and require further investigation.

Here is a screenshot of part of the worksheet:

health care data

Tip #1: Evaluate the shape of your data

You can use a histogram to graph all eight surfaces that were tested in separate panels of the same graph. This helps you observe and compare the distribution of data across each touch point.

If you’ve downloaded the data, you can use the ATP Unstacked.MTW worksheet to create this same histogram by navigating to Graph > Histogram > Simple. In the Graph Variables window, select Door Knob, Light Switch, Bed Rails, Call Button, Phone, Bedside Table, Chair, and IV Pole. Click on the Multiple Graphs subdialog and select In separate panels of the same graph under Show Graph Variables. Click OK through all dialogs.

health care data - histogram 

These histograms reveal that:

  • For all test areas, the distribution is asymmetrical with some extreme outliers.
  • Data are all right-skewed.
  • Data do not appear to be normally distributed.

Tip #2: Identify and investigate outliers

An individual value plot can be used to graph the ATP measurements collected across all eight surfaces. Identifying the outliers is quite easy with this plot.

And again, you can use the ATP Unstacked.MTW worksheet to create an individual value plot that looks just like mine. Navigate to Graph > Individual Value Plots > Multiple Y’s > Simple, and choose Door Knob, Light Switch, Bed Rails, Call Button, Phone, Bedside Table, Chair, and IV Pole as Graph variables. Click OK.

health care data - individual value plot

This individual value plot reveals that:

  • Extreme outliers are present for ATP measurements on Bed Rails, Call Button, Phone, and Bedside Table.
  • These extreme values are influencing the mean ATP measured for each surface.
  • It may be more helpful to analyze differences in medians since the means are skewed by these outliers (judging by the histogram and individual value plot).

Once the outliers are identified, you can investigate them with Minitab’s brushing tool to uncover more insights by right-clicking anywhere in the individual value plot and selecting Brush. Setting ID variables also helps to reveal information about other variables associated with these outliers. To do this, right-click in the graph again and select Set ID Variables. Enter Room as the Variable and click OK. Click and drag the cursor to form a rectangle around the outliers as shown below.

health care data - brushing

Brushing can provide actionable insights:

  • Brushing the extreme outliers on the individual value plot and setting ID variables reveals the room numbers associated with high ATP measurements.
  • Quickly identifying rooms where surfaces have high levels of ATP enables faster follow-up and investigation on specific surfaces in specific rooms.

Finally, you can use conditional formatting and other cell properties to investigate and make notes about the outliers. To look at outliers across all surfaces tested, highlight columns C2 through C9, right-click in the worksheet, and select Conditional Formatting > Statistical > Outlier. Alternatively, you can highlight only the extreme outliers by right-clicking in the worksheet, selecting Conditional Formatting > Highlight Cell > Greater Than and entering 2000 (a value we know extreme outliers are above based on the individual value plot).

To make notes about individual outliers, right-click on the cell containing the extreme value, select Cell Properties > Comment, and enter your cell comment.

health care data - conditional formatting

Conditional formats and cell properties offer:

  • Quick insight into surfaces and rooms with high ATP measurements.
  • More efficient investigation of problem areas in order to make process improvements.

Visualizations that Lead to Actionable Insights

By exploring and visualizing your data in these preliminary ways, you can see how easy it is to draw conclusions before even doing an analysis. The data is not normally distributed but is highly skewed by several extreme outliers, which greatly influence the mean ATP measurement recorded for each surface. The first graph created to visualize the data is helpful evidence that comparing medians instead of means may be a more effective way to determine if statistically significant differences exist across surfaces. Investigating these outliers both graphically and in the worksheet offers further evidence that analyzing differences in median measurements will be most effective. It is also obvious that bed rails, call buttons, phones, and bedside tables are highly contaminated surfaces—one might surmise this is because of the touch points’ close proximity to sick patients, and the frequency with which patients come into contact with these surfaces.

You can use these insights to focus our initial process improvement efforts on the most problematic touch points and hospital rooms. In part 2 of this blog post, I’ll share some tips for manipulating data, extracting even more information from the data, and displaying descriptive statistics about contamination levels.