Why Is this Yorkie So Irritated? Oversimplified Statistical Models

You know what really gets on my nerves? A lot of things.

That slow, slinky way that cats walk by. Grrrr.

The rude, abrupt arrival of delivery persons in their obnoxiously loud trucks. (Why do they always pull up just as I’m settling down for a nap?) Grrrr.

Total strangers who reach down and poke me with fat, clumsy fingers that reek of antibacterial soap. Grrrr.

And this one always gets my dander up: Me and the human are out on a walk when some passerby  stops and points at me.

“What a cutie. How old is she?”

"What insolence!" I'll yap back. "I’m a he! And how old are YOU!!?"

Then I’m told to shut up.

“He’s 7.”

“Oh, so that means he’s...7*7 = 49 years old in dog years.” 

An Oversimplified Model of Dog Years

I have a bone to pick with oversimplified estimation models like this. Not every relationship can be neatly described by a straight line, you know.

I know, I know, it's not the humans’ fault. They just haven’t been properly trained. So a lot of them think they can plot a few data points on an X-Y scale and plunk a strand of raw spaghetti over it to predict whatever they darn well please. That’s something a Great Dane would do. Grrrr.

We Yorkies know that relationships can be much more complicated.

Look, I was completely potty trained at the age of 2 months. Human children, not the brightest, are lucky to get it by the time they're 3 years old. At 3 months, I could consistently comply with verbal requests to sit, stay, and lay down. Human children? Good luck with that!

In early life, there are at least 15 dog years to every human year! Maybe more.

And what about later life? Did you hear about the Australian Cattle Dog named Bluey who lived to be almost 30 years old? You mean to tell me he was over 200 years old in dog years? Grrrr.

My point? Even if a 1-to-7 relationship between human years and dog years generally holds true, you have to be doggone careful about extrapolating across all X-Y values.You need to carefully define the range of values for which your linear model holds.

Modeling Data Using Quadratic or Cubic Functions

Another option is to break away from the straight-jacket. Consider using a quadratic or cubic function to model data that don't toe the line:

Being a fearless Yorkie, I’m not even afraid to consider using a more advanced nonlinear function, if it can provide a significantly better fit than a linear regression model.

Like a sigmoidally shaped function:

a concave function:

or a function with local minima and maxima:

I could yap on and on about this, but then I'll be told to shut up. You can find these and other nonlinear functions in Minitab's catalog of functions for nonlinear regression (Stat > Regression > Nonlinear Regression > Use Catalog).

Are You Ignoring Breeds in Your Data?

There's another thing about that Human Years/Dog Years graph that really makes me froth at the mouth. It treats all dogs as a single population. And we all know what happens when you make an inference based on single population when, in fact, your data are actually from distinct populations.

You get a one-size-fits-none estimate. Grrrrr.

The obvious trend in this bar chart makes my point:

How could humans use the same model to estimate the "dog age" of an Irish Wolfhound (average lifespan ≈ 6 years) and a miniature poodle (average lifespan ≈ 15 years)?!

Do me a favor. Next time, instead of overestimating my age, at least use a model with a grouping variable that accounts for these distinct populations, would you?

This model isn't perfect either. I, for one, certainly don't appreciate being lumped together with dogs that weigh twice as much as I do. 

And I hate to sound catty, but compared to some other breeds, I think I look much younger than my age.


Name: Jasmine • Tuesday, June 10, 2014

The Minitab blog is the most fun place to talk statistics- the most- I am making a point estimate here and I mean it. ;)
Great way to rubbish oversimplified models!

Name: Patrick • Tuesday, June 10, 2014

Hi Jasmine,

The confidence interval for your point estimate is narrow, which means your comment is very precise and dependable.

So glad to hear you enjoy reading about statistics on the Minitab blog. We try to make statistics not just painless, but enjoyable. And to have a good time doing it.

Appreciative readers such as you make it all worthwhile. Thanks for reading!

blog comments powered by Disqus