Punxsutawney Phil and His 2-Sample T-test

Groundhog and 2-Sample t-testGroundhog Day is apparently a pretty big deal in Punxsutawney, Pennsylvania. According to one CBS article, organizers expected over 15,000 people to see the United States’ most popular groundhog last week.

For those unfamiliar with the legend, here's the idea: If Punxsutawney Phil sees his shadow after coming out of his hole, he will retreat and we will be forced to endure six more weeks of winter. If he doesn’t see his shadow, then we will be graced with an early spring. Sounds far-fetched, right? Let’s look at a 2-sample t-test to see if Phil has what it takes to be Punxsutawney’s chief advisor on how quickly their winter ends.

The data I obtained is a list of average temperatures for years when Phil saw his shadow and for when he didn’t. Each value represents Pennsylvania’s average temperature for a two-month period, from February to March. I wish I had been able to be more precise and obtain average temperatures for a 6-week period, but it proved to be too difficult to find.

Data Table For Average Temperatures (Fahrenheit)*


Temps/No Shadow Temps/Shadow
32.15 33.3
28.6 33.8
33.25 31.55
35.8 33.3
32.15 30.8
37.4 33.25
32.3 30.7
32.8 36.25
34.65 32.7
32 36.9
29.35 38.15
31.15 29.9
39.42 29.5
38.22 28.55
* 33.05

*Unfortunately, I was only able to obtain 14 temperatures for when he didn’t see his shadow.


Hypothesis Testing with the 2-Sample T-test

My goal here is to determine if there is any evidence that the difference between the two sample means is greater than zero.


The null hypothesis: The true mean average temperature (Feb-Mar) for the years when Phil sees his shadow and the years when he does not are equal.

The alternative hypothesis: The true mean average temperature(Feb-Mar) for the years when he does not see his shadow is greater than the true average mean temperature for when he does.

To analyze this in Minitab Statistical Software, let’s go to Stat > Basic Statistics > 2-Sample t. That will bring up the following dialog box: 


We'll also want to check our settings using the "Options" button: 



After filling out these dialogs as shown above and hitting OK, I obtain these results:

two-sample t-test output


Interpreting the Results of the 2-Sample T-test

Since our p-value is greater than our alpha value (0.05), we fail to reject the null hypothesis. We do not have enough evidence to say that Pennsylvania, on average, is warmer during those two months when Phil doesn’t see his shadow.

“With great power, comes great responsibility,” a great man named Uncle Ben Parker once said. But Uncle Ben forgot to tell Spider Man that, when it comes to statistics, having greater power also increases the chance of detecting a difference if there really is one.  

Statistically speaking, power equals 1- b, where b is the probability of making a Type II error (failing to reject the null hypothesis when it is false). One way that I could have increased the power of this 2-sample t-test would have been to increase the sample size for each sample. Unfortunately, I was restricted by how many times it was reported that the groundhog did or did not see his shadow.

Even with more data and increased statistical power, though, a reliable assessment of Phil's weather forecasting ability may be difficult to achieve. For example, let's hope the organizers of the Punxsutawney’s Groundhog Day place their audience in a location that doesn’t bias the results in any way. If I crawled out of my hole and saw 15,000 people around me, there is no way I’d stay above ground!




7 Deadly Statistical Sins Even the Experts Make

Do you know how to avoid them?

Get the facts >


blog comments powered by Disqus