# Calculating the Probability of Converting on 4th Down

Imagine a multi-million dollar company that released a product without knowing the probability that it will fail after a certain amount of time. “We offer a 2 year warranty, but we have no idea what percentage of our products fail before 2 years.” Crazy, right? Anybody who wanted to ensure the quality of their product would perform a statistical analysis to look at the reliability and survival of their product.

Now imagine a multimillion-dollar football organization that makes 4th down decisions without knowing the probability that they will convert the 4th down. “We punt on every 4th and 1, but we have no idea what percentage of the time we would keep possession if we went for it.” That's just as crazy, except that seems to be what every football organization does.

But it doesn’t have to be this way. Just like businesses use statistics to improve the quality of their products, football teams should use statistics to improve their chances of winning. So I’m going to use Minitab’s binary logistic regression to create a model that will let us know the probability a team has of successfully converting on 4th down.

## The Data

We’re continuing our quest to make a Big Ten 4th down calculator, so we’ll start with the same data that we used to create a model for expected points. For every 3rd down in Big Ten conference games the last 2 seasons, I recorded the distance needed to convert, whether the team on offense was at home or away, and whether they converted. I used 3rd down instead of 4th down to increase the sample size. And since the goal on 3rd down is the same as 4th down (convert in one play), the probabilities should be the same.

Speaking of the probabilities, we can use a scatterplot to get an initial look at how distance affects the probability of converting.

The probability of converting decreases pretty consistently as the distance increases. The data does appear to level out a bit between 10 and 15 yards before decreasing again. And there are some outliers at the end of the data, but that is due to small sample sizes.

Now, I do have a different data set with a much larger sample that we can use to eliminate the noise in the data, but first I want to show something with this first data set that we can’t show with the next one.

## The Effect of Playing at Home or Away

In the model for expected points, the location of the game affected a team's expected points. Will we see the same effect on the probability of converting on 3rd down? We’ll use binary logistic regression to determine whether Home or Away is a significant term in the model.

When it comes to the probability of converting on 3rd down, it doesn’t matter whether the team is home or away. The p-value in the regression analysis is 0.994, which is much greater than the common significance level of 0.05. So why does it matter for expected points, but not here? My best guess is the sample size. Home field advantage has such a small effect on a single play that it doesn’t show up in the 3rd down conversions. But over the course of a multiple play drive (like what we looked at in the expected points model), those small effects add up and the effect of home field advantage becomes noticeable.

So when it comes to a single play, we can ignore home field advantage.

## The Data: Part II

To increase our sample size, fellow blogger Joel Smith was kind enough to share data he collected on every college football game from 2006–2012. Because our sample size was so large, we can actually look at 4th downs instead of 3rd downs. Here is a scatterplot of the data:

We see a similar pattern as before. The data decreases until about 10 yards where it levels out a bit before decreasing practically to 0% after 20 yards. And that outlier? Teams were 1 for 3 on 4th and 34. That one success came in the 4th quarter when the team on offense was down by 21 points, so the defense probably no longer had their starters in. That means we should clean up the data to try and remove points like these.

To try and avoid games that were blowouts, I removed any 4th downs where the score differential was greater than 4 touchdowns in the first 3 quarters, and greater than 16 points (3 scores) in the 4th quarter. Finally, I removed any distance greater than 20 yards, since the probability basically drops to 0. This means the decision on anything greater than 4th and 20 should be very easy. Punt or kick a FG unless it’s late in the game and you absolutely need to score a touchdown. So we don't really need to worry about modeling that for our 4th down calculator.

After removing these observations, we still have 11,623 4th downs. Here's the data I used.

## The Final Model

We already saw that it doesn’t matter whether you’re playing at home or on the road, but there is another factor we should take into account. When you get closer to the goal line, the defense has a smaller portion of the field to defend. This might make it harder to convert on 4th down when you have to score a touchdown rather than simply get a first down. So I created a variable to determine whether it was 4th and goal or not to include in the model.

There also appears to be some curvature in the data, so I included the 2nd and 3rd order terms for distance. And lastly, our integers for distance represent the midpoint of the actual distance. For example, on 4th and 4 you could really have to gain anywhere from 3.5 to 4.5 yards. But on 4th and 1, the range is really 0 yards to 1.5 yards. So instead of using the integer 1, I used 0.75.

Now let’s put our data into Minitab and see the results.

The p-values for all of our terms are less than 0.05, so we can conclude that they are all significant and keep them in the model. The Deviance R-squared value tells us that 97% of the deviance in the probability of converting on 4th down can be explained by the model. We can now use the model to predict the probability to converting at different distances.

 Distance Probability when Goal to go Probability when not Goal to go 1* 61% 70% 2 50% 60% 3 43% 53% 4 37% 46% 5 32% 41% 6 29% 37% 7 26% 34% 8 24% 32% 9 22% 30% 10 21% 28%

*I used a value of 0.75 for the prediction

We see that being at the goal line decreases your chances on 4th down by about 10%. We also see what a drastic effect just a couple of yards makes. Imagine getting a false start penalty and having your 4th and 1 go to 4th and 6.  You just cut your odds of converting in half!

So let’s go back to that coach who punts on every 4th and 1. Now that we have our data, we can analyze whether he is making the correct decision. Let’s say he has a 4th and 1 at his own 10 yard line and is playing on the road. We can use our expected points model and our 4th down model to see what the correct decision should be.

 Decision Expected Points Success Expected Points Fail Total Expected Points Go for it -0.64 -5.9 -2.2 Punt* -2.9 N/A -2.9

* The average net punt in the Big Ten was about 40 yards, so that’s the value I used.

By this model, in going for it on 4th down the coach increases his expected points by 0.7 points. That may not sound like much, but imagine making a similar decision 4 or 5 times a game. Those expected points add up to about a field goal. Think there is a coach out there who wouldn’t want an easy way to increase their score by 3 points?

And keep in mind our numbers assume you only gain 1 yard on 4th down. When you account for the fact that you can gain more than 1 yard, the case for going for it only strengthens. As Alabama found out against Ohio State last year, even a simple running play up the middle has the potential to go the distance.

So now we’re all set to track the 4th down decisions in this upcoming Big Ten season. The first Big Ten conference game is September 19th, when Rutgers takes on Penn State. And the Big Ten 4th down calculator is ready and waiting.

Let the games begin!