Although a polar vortex hit most of the northern United States last week, thousands of visitors did converge on Punxsutawney, Pennsylvania this past Saturday to see if their famous groundhog Punxsutawney Phil would see his shadow.

For those unfamiliar with the legend, here's the idea: If Punxsutawney Phil sees his shadow after coming out of his hole, he will retreat and we will be forced to endure six more weeks of winter. If he doesn’t see his shadow, then we will be graced with an early spring. He's only been correct 39 percent of the time according to Stormfax Almanac, but this year he did not see his shadow, so here's hoping he's right and we do get our early spring!

Let’s look at a 2-sample t-test to see if Phil has what it takes to be Punxsutawney’s chief advisor on how quickly their winter ends.

Editor's note: This data was originally collected and published on The Minitab Blog in February 2012.

The data I obtained is a list of average temperatures for years when Phil saw his shadow and for when he didn’t. Each value represents Pennsylvania’s average temperature for a two-month period, from February to March.

Data Table For Average Temperatures (Fahrenheit)*

 Temps/No Shadow Temps/Shadow 32.15 33.3 28.6 33.8 33.25 31.55 35.8 33.3 32.15 30.8 37.4 33.25 32.3 30.7 32.8 36.25 34.65 32.7 32 36.9 29.35 38.15 31.15 29.9 39.42 29.5 38.22 28.55 * 33.05

*Unfortunately, I was only able to obtain 14 temperatures for when he didn’t see his shadow.

## Hypothesis Testing with the 2-Sample T-test

My goal here is to determine if there is any evidence that the difference between the two sample means is greater than zero.

The null hypothesis: The true mean average temperature (Feb-Mar) for the years when Phil sees his shadow and the years when he does not are equal.

The alternative hypothesis: The true mean average temperature(Feb-Mar) for the years when he does not see his shadow is greater than the true average mean temperature for when he does.

To analyze this in Minitab Statistical Software, let’s go to Stat > Basic Statistics > 2-Sample t. That will bring up the following dialog box:

We'll also want to check our settings using the "Options" button:

After filling out these dialogs as shown above and hitting OK, I obtain these results:

## Interpreting the Results of the 2-Sample T-test

Because our p-value is greater than our alpha value (0.05), we fail to reject the null hypothesis. We do not have enough evidence to say that Pennsylvania, on average, is warmer during those two months when Phil doesn’t see his shadow.

“With great power, comes great responsibility,” a great man named Uncle Ben Parker once said. But Uncle Ben forgot to tell Spider Man that, when it comes to statistics, having greater power also increases the chance of detecting a difference if there really is one.

Statistically speaking, power equals 1- b, where b is the probability of making a Type II error (failing to reject the null hypothesis when it is false). One way that I could have increased the power of this 2-sample t-test would have been to increase the sample size for each sample. Unfortunately, I was restricted by how many times it was reported that the groundhog did or did not see his shadow.

Even with more data and increased statistical power, though, a reliable assessment of Phil's weather forecasting ability may be difficult to achieve. For example, let's hope the organizers of the Punxsutawney’s Groundhog Day place their audience in a location that doesn’t bias the results in any way. If I crawled out of my hole and saw thousands of people around me, there is no way I’d stay above ground!