Fun with Statistics

I’m Joel Smith, senior business development representative by title and stat nerd by education and personality. The best part about doing data analysis is solving real problems, and my job allows me to work with consultants and partners to solve challenging problems using Minitab.

If you’ve ever heard me talk about statistics, you know that I’m very passionate about what can be accomplished through good analysis. I like to use fun datasets as examples! Whether it’s making a better bowl of oatmeal, determining when during the year people are most interested in dieting, or what zodiac signs get...

Forget Statistical Assumptions - Just Check the Requirements!

One of the most poorly understood concepts in the use of statistics is the idea of assumptions. You've probably encountered many of these assumptions, such as "data normality is an assumption of the 1-sample t-test."  But if you read that statement and believe normality is a requirement of the 1-sample t-test, then you have missed a subtle and important characteristic of assumptions and need to read on...

An "assumption" is not necessarily a "requirement"!

To understand where this idea of assumptions come from, let's forget about statistics for a minute and imagine we sell bikes online.  We...

They Call Them "Free" Throws For a Reason

When Penn State guard Jermaine Marshall stepped to the line to take two free throws with 0:27 remaining against Ohio State, it didn’t really matter whether he made the shots. The game was already out of reach, and although the Nittany Lions would attempt to foul their way into a miracle victory, most of the fans were all too aware that Penn State was now 0-8.  That Marshall then missed both free throws was the exclamation point on a night where the team made just 13 of 22 free throw attempts.

Lest you not already know this, a "free throw" is a shot taken against no defense, a shot that likely...

Was Alabama's Blowout of Notre Dame Really Unexpected?

In this year's BCS Championship game, Alabama dominated Notre Dame 42-14 in a game that was never really even close. While many people felt Alabama would win the game, most expected a defensive battle. Few predicted it would have been so lopsided (and only a small percentage of those would have actually bet money on a blowout).

But should we really be surprised?  I mean, Alabama clearly outperformaed expectations—but did they do so in a truly unusual manner?

How Can Data Reveal If a Victory Was Unusual?

To investigate how expected or unexpected this game's 28-point margin of victory was, we...

Any Chance We Share a Birthday?

I have a birthday coming up, and wanted to share a wealth of statistics about birthdays that you may find entertaining.

First is the "Birth Day Problem."  Some of you probably encountered this one in a statistics class at some point.  The Birthday Problem is as follows: How many people would need to be in a room in order for there to be a 50% chance that two share a birthday?  This is a fun problem because the answer is surprising for many people. After all, we always seem surprised when we meet people we share a birthday with.

Often the answer you get is 183, or half of the number of days in...

Beyond the "Regular Guy" Control Charts: An Ode to the EWMA Chart

It's no secret that in the world of control charts, I- and Xbar- are pretty much the popular kids in school.  But have you ever met their cousin EWMA? That's him in the middle of the class, wearing the clothes that look nice but aren't very flashy. You know, when Xbar- and I- were leading the championship football team last month, EWMA won the state tennis championship?  I didn't go either -- pretty much only the player's parents go to tennis matches -- but I heard that he won it.  Someone told me he even got a scholarship to an Ivy League school to play, not that he needed it with his grades...

Does the NFL Preseason Matter? Regression Analysis Says "No."

Admit it—if you follow NFL football, both of the following statements are likely true:

  • When talking about the preseason with friends, you say that the preseason doesn't matter and doesn't mean anything for the regular season, so you're not really worried or excited about your team's performance.
     
  • You are worried or excited about your team's performance.

All of us say it doesn't matter, but after so many months without football we are desperate for anything meaningful. We want to see how the recent draftees perform, or how the defense is doing under a new coordinator, or a hundred other things....

Testing for Normality: A Tale of Two Samples by Anderson-Darling

With apologies to Charles Dickens, I'd like to begin this post by summing up the Anderson-Darling statistic this way:

It was the best of fits, it was the worst of fits, it was the test of normality, it was the test for non-normality, it was the plot of belief, it was the plot of incredulity, it was the p-value of Light, it was the p-value of Darkness, it was the spring of hope, it was the winter of despair, we had everything before us, we had nothing before us...

I read and participate in discussions about a broad range of statistical topics daily, and few elicit as much misinformation combined...

How Olympic Judging Stacks Up, Part II

As a follow-up to my recent article on how judges did in evaluating performances at two previous Olympic events, I wrote a blog post recently comparing the events from the 2008 Olympics and whether each demonstrated judging bias and if so, how much.

But regardless of whether bias exists, the real purpose of judging these events is to determine the best performances and specifically to award gold, silver, and bronze medals. So now I want to take a look at those same events and whether or not each is adequately determining the medal winners…

If you read the original article, you know that...

How Olympic Judging Stacks Up, Part I

You may have read my recent article applying statistical analysis to how judges did in evaluated performances at two previous Olympic events. If so, perhaps you found yourself wondering how other events stack up…

Anticipating a desire to see the “cleanest” and “dirtiest” judging performances, I pulled all of the data I could find on every event from the 2008 Beijing Olympics that is judged on a continuous scale. In this post, I will examine which events* showed judging bias, and quantify that bias to provide some comparison between sports.

* Data on the individual judging scores could not be...

Meteorology and the Triple Jump

If you've ever looked at the results of Olympic Triple Jump, you've probably noticed that right beside the athlete's "mark" (jump distance) is the wind as measured at the time of the jump:

The natural assumption to make is that, of course, wind must affect how far the athlete's are able to jump.  In track lingo jumps with a tailwind are referred to as "wind-assisted" and most track records set limits on how much tailwind can be present in order for the record to be official.  But how much does the wind matter?

To investigate, I looked up this morning's Women's Triple Jump qualifiers and did some...

Visualizing the Greatest Olympic Outlier of All Time

Readers of a certain age or interest in Olympic history probably know the name Bob Beamon, but for those who don’t, I’ll quickly provide a summary of “the leap.”

Born in Queens, Beamon was raised by his grandmother after his abusive father threatened to kill him if his mother brought him home from the hospital. As fate would have it, his mother died eight months later at age 25. Despite entering the world with the chips stacked against him, Bob went on to excel at the long jump in high school and eventually earned a college scholarship.

Fast-forward to 1968. Bob is having a great year in the...

Identical Twins, Rowing, and the Luck of the Lane

Caroline and Georgina Evers-Swindell are identical twin sisters from New Zealand. But they are not identical only in the sense that they look alike; they are both very strong and very competitive athletes, and both excel at the sport of rowing.  Rather than compete against one another, however, they compete together in the sport known as Women’s Double Sculls. (To those of us less familiar with rowing terms, that’s where you have two people rowing the same boat in a race.)  Here they are, in action:

Recently I’ve been writing on the fairness of judging in Olympic events that require subjective...

When Even Cupid Isn't Accurate Enough: Interval Plots and Olympic Finals

Most of us who are married have a picture of our spouse somewhere in our office—maybe a wedding photo, a picture from last year's vacation, or a family shot with the kids.  Matt Emmons likely keeps a picture of his wife as well, but it probably looks something like this:

You see, Matt is a professional sport shooter who had a very interesting 2004 Olympics. Just prior to the Olympic Team Trials, someone entered the locker room where his rifle was stored and severely damaged the gun in an apparent sabotage attempt. Matt borrowed a former college teammate’s rifle for the trials and went on to win...

Has Figure Skating Judging Improved? What Do the Numbers Say?

In my recent article on judging in the Olympics, I included an analysis of the controversial 2002 Pairs Figure Skating results. As a result of that scandal, the International Skating Union (ISU) changed the rules for judging competitions to eliminate judging inconsistencies and prevent future scandals.

In the new system, pairs are judged on Grade of Execution—which is scored differently and not discussed here—and Program Components. For Program Components there are five components that nine different judges score. Seven of those judges are randomly chosen for each skater (the judges don’t know...

Tour de France: Statistics Reveal the Drama

57 Seconds. After more than 2,000 miles and nearly three weeks of grueling cycling, Cadel Evans needed 57 seconds to catch the leader.  And he would have to do it riding alone for only 26.4 miles.

He gained two and a half minutes.

When you watch the Tour de France, you realize that amid the extreme physical challenges of the race comes a high level of personal drama. And while most people picture a large pack (the peloton) of virtually every rider in the tour riding together for most of the race, you might be surprised at just how much separation happens during each stage of the race.

Stages are...

What I Learned from Treating Childbirth as a Failure

My wife and I are expecting a baby girl soon—very soon, in fact, as in "Will this blog post be published before the baby is born?" soon. The due date given is May 19th, but we stat geeks know that a point estimate just isn't good enough...we want probability intervals that reflecting the uncertainty in the data.

I found a chart at http://www.longislandmidwives.com/00006250-200609000-00006.pdf that lets me know the number of babies born to "spontaneous labor" by each week of pregnancy, but I'm interested in more precision than just the week.  I converted the data to days instead of weeks (for...

Cirque du Soleil: The Immortal Takt Time World Tour

Cirque du Soleil, the French circus known for its acrobatics, is currently on the road with its "Michael Jackson The Immortal World Tour" and made a stop just a few miles from the Minitab World Headquarters. 

My wife and I decided to go to the 8:00 show, but little did I know the performance would be preceded by a lesson in takt time...

Here is a timeline of events:

7:20 - A friend arrives to find long lines at each of the four arena entrances and doors closed.

7:30 - Doors are opened to allow ticket holders to enter after a weapons search (I still have not figured out why the potential...

Analyzing Titanic Survival Rates, Part II: Binary Logistic Regression

In honor of the 100th anniversary of the sinking of the Titanic, we recently posted a dataset on the passengers aboard the ship that included Class (coach or first), Gender (female or male), Age, and Status (survived or died).  From Age an additional column was created indicating Child (17 years or younger) or Adult (18 years or older).

In an earlier post, we showed how survival rates could be compared between levels of one variable—for example, females versus males—using Stat > Tables > Cross Tabulation and Chi Square.  But what if we wanted to take allfactors into consideration to paint a...