At the start of a new year, I like to look for data that’s labeled 2016. While it’s not necessarily new for 2016, one of the first data sets I found was healthcare.gov’s data about qualified health and stand-alone dental plans offered through their site.

Now, there’s lots of fun stuff to poke around in a data set this size—there are over 90,000 records on more than 140 variables. But to start out I used Minitab to do some exploratory graphical analysis.

One statistic you might be interested in is the mean cost of the plans available. Minitab makes this easy because Minitab’s bar chart automatically computes the means, and other statistics, to plot them. This is a chart of the means by state for premiums paid by 21-year old adults. I colored Utah in red because it’s going to do something none of the other states do.

Here’s a bar chart of the means for a couple with 2 children, aged 40:

See how Utah moved? For 21-year olds, Utah was the second-cheapest. For the category of couple+2 children, age 40, Utah’s not radically different in price from many other states, but its rank changed. In fact, of all the states, Utah is the only one that changed position relative to any others.

We’re not talking about large differences in the means, but what makes the change seem really odd is this: Utah is the only state where the mean price for a couple+2 children, age 40, is not completely determined by the price for adults at the age of 21.

Here’s a scatterplot of the means of all plan premiums in each state for the two example groups from the dataset. Utah is the red dot:

If you remove Utah from the data set (Minitab makes excluding points easy) the R2 value is 100%

Does the difference have to do with the plans? Do the providers in Utah do something different? Is this simply a quirk of how the data are recorded? Does it have to do with Utah’s history of providing a healthcare exchange before the Affordable Care Act? It’s hard to say without looking a little deeper. But Minitab’s easy exploratory graphs make it simple to find the points in a data that show the need for further investigation.

I’ll do my own follow-up, because my natural curiosity can’t be satisfied otherwise. If you have your own hypothesis, feel free to share it in the comments section.