On Saturday, September 8, 2012, Penn State football player Sam Ficken had a kicker’s worst nightmare. Playing against Virginia, he missed 4 field goals, including the potential game-winner as the game ended. To add injury to insult, he also had an extra point blocked.

Penn State lost the game by a single point.

At that point in his career, Ficken had made 2 out of his 7 field goal attempts. That equals about a 29% success rate, which is terrible for kickers. Many called for Ficken to be benched. He was harassed on Twitter (to put it mildly). And a Penn State soccer player even made a YouTube video of himself kicking field goals at the Nittany Lion practice facility, prompting many fans to suggest Coach Bill O’Brien to give him a tryout. It was pretty apparent Ficken just wasn’t a good kicker.

Or was it?

Flip a coin 7 times. If tails comes up only twice, are you going to conclude that the coin is “biased” towards heads? Of course not, you simply had an unlikely outcome (the coin coming up heads 71% of the time) because 7 tosses is a very small sample size. Now, kicking field goals is a lot different than flipping a coin, but the same idea applies. So let’s do a data analysis on Ficken’s field goal percentage.

NOTE: I’m going to use a 1 Proportion analysis, which assumes the probability of each observation is the same. Obviously this isn’t true for field goals. Distance, weather conditions, and altitude all affect the probability of the kicker making the goal. Even the opponent can affect the probability: your odds aren’t as good if LaVar Arrington circa 1999 is lining up to block the kick! But I’m really just trying to illustrate the amount of variation that exists in small samples, not trying to accurately gage Ficken’s true field goal percentage. So for purely illustrative purposes, I’m going to use the 1 Proportion analysis anyway...just take the statistics with a grain of salt if you were hoping for a comprehensive review that includes all possible factors.

# How Confident Can We Be in Ficken’s Field Goal Percentage?

After the Virginia game Ficken was 2 for 7 (29%) on field goal attempts. Using these numbers, we can use Minitab’s 1 Proportion analysis to create a confidence interval. This confidence interval will give us a range of likely values for the percentage of kicks that Ficken will make going forward. That is, it gives us an idea of how confident can we be that these 7 kicks represent Ficken’s true kicking percentage.

The confidence interval tells us that we can be 95% confident that Ficken’s true field goals percentage is between 3.7% and 71%. That range is so large that it’s pretty much worthless! So anybody trying to make an accurate assessment of Ficken’s ability based off of those 7 kicks is doing nothing other than guessing. Moreover, the range actually increases if you look at only the 5 kicks in the Virginia game (which many people did)!

# Ficken’s Career Since the Virginia Game

If there's one person who could accurately judge Ficken, it's Penn State Coach Bill O’Brien. He'd seen plenty of Ficken kicks in practice, and had a lot more than 7 observations to make his decision on. And he decided to stick with Ficken as his kicker.

Boy, has that decision paid off.

Since the Virginia game, Ficken has made 20 of 24 field goal attempts. He hit a Penn State record of 15 field goals in a row, and also made a 54-yard field goal--a Penn State home record--in the rain. In his career, Ficken is now 22 for 31 on field goal attempts, good for 71%.

And wouldn’t you know it, that equals the upper bound from the 95% confidence interval we created earlier!

Clearly, Ficken is a better kicker than his first few attempts showed. And considering where he had to be at mentally after the Virginia game, it’s a great story to see him bounce back and perform so well. But, statistically speaking , how good can we really claim he is? Since we now have another 24 observations, let’s combine them with the original 7 and calculate an updated 95% confidence interval for Ficken’s field goal percentage.

Now that we have more observations, we can narrow down Ficken’s true ability much better. The new lower bound for the interval (52%) is nowhere close to the 29% that Ficken made in his first 7 attempts.

But the confidence interval is still pretty wide, with a range of about 34%. There is a chance his true field goal percentage is in the 50% range, which would put him among the worst kickers in the country!

How big of a sample size do we need in order to really be confident in Ficken’s abilities?

# How Many Kicks Do We Need?

To answer that question, first we need to decide how “narrow” we want our confidence interval to be. This is the same thing as determining the margin of error. For example, let’s use Ficken’s current field goal percentage of 71%. If the margin of error were 5%, our confidence interval would range from 66% to 76%.

But instead of picking just one, let’s use a couple margins of error to compare the different sample sizes needed for each one. We’ll use margins of error of 10%, 5%, and 1%. Then we can use Minitab’s Sample Size for Estimation analysis to get the sample sizes.

To obtain a margin of error of 10% (which is still pretty wide) we would need 99 kicks. It skyrockets to 359 for 5%, and becomes an unattainable 8,129 kicks for a 1% margin of error! To put that in perspective, former Penn State kicker Kevin Kelly was the starter at Penn State for 4 years, and attempted only 107 field goals. And Sebastian Janikowski is in his 14th year of kicking in the NFL, and has only 409 attempts.

Your average college kicker will get between 20 and 30 field goal attempts per year. And unless you’re a four-year starter, you’re not getting close to 99 kicks for your career. That means for a college kicker, even if every field goal attempt has the same probability of being made (which it doesn’t), we still have a pretty wide margin of error when determining just how accurate the kicker is.

So when you want to make claims based on statistics, make sure you have a sufficiently large sample. And that’s not just in the world of sports. Sample sizes are important for everything from determining the net weight of the cereal in packages to Mythbusters determining whether women are better at multitasking. If you don’t have a large enough sample, your conclusions might be meaningless. To find proof, you need only look at Sam Ficken.