Revisiting the Relationship between Rushing and NFL Wins with Binary Fitted Line Plots
Back in November, I wrote about why running the football doesn’t cause you to win games in the NFL. I used binary logistic regression to look at the relationship between rush attempts (both by the lead rusher and by the team) and wins. The results showed that the model for rush attempts by the lead rusher and wins fit the data poorly. But the model for team rush attempts and wins did fit the data well (although we went on to show that the team rushing attempts wasn’t causing the winning).
We were able to conclude this by looking at the p-value and goodness-of-fit tests. But what if we wanted to trade our boring output with some more entertaining images? Well, in the previous versions of Minitab, we were out of luck. But Minitab includes a tool that is perfect for our situation, Binary Fitted Line Plots!
What Is a Binary Fitted Line Plot?
A binary fitted line plot examines the relationship between a continuous predictor variable and a binary response. A binary response variable has two possible outcomes, such as winning or losing a football game.
First, let’s consider a regular fitted line plot. This plot examines the relationship between a continuous predictor and a continuous response. For example, we could visualize the relationship between a person’s income and the value of their house.
We can clearly see that people with higher incomes own more expensive houses. So let’s see what this looks like if we swap out the value of the house with a binary variable, such as whether or not they own a vacation home.
The x axis still shows us the income of each person. But the y axis is now the probability a person has of owning a vacation home. All of our observations (the blue dots) have a probability of either 0 (they don’t own a vacation home) or 1 (they do own a vacation home). Looking at just the blue dots, we can see lower income values tend to have a probability of 0, while the higher incomes have a probability of 1.
The red line shows us the probability that a single person would have of owning a vacation home based on their income. We can see that for incomes below about $80,000, the chances of owning a vacation home are very low, about 20% or less. Incomes above about $120,000 have a very high chance of owning a vacation home. You can’t be as certain whether a person will own a vacation home for incomes between $80,000 and $120,000.
Let’s look at one more example before we return to our football study. What would a non-existent relationship look like? For example, what if we changed the binary variable to whether or not the person likes vegetables?
The red line is pretty much horizontal. This tells us that no matter what a person’s income is, the probability that they like vegetables is around 60%. So clearly there is no relationship between income and liking vegetables.
Now that we’ve gone through some examples, let’s get back to football! And note that for all the examples above, the data were completely made up for illustrative purposes. So no need to e-mail me demanding to know where I found the dataset that indicates 60% of people like vegetables!
Binary Fitted Line Plots for Rushing Attempts vs. Wins
In my post back in November, the first thing we looked at was individual rushing attempts and wins. We found a significant relationship between the two, but the model didn’t fit the data well. In other words, the model did a really bad job predicting whether a team won or lost based on the number of carries by the lead rusher.
So let’s see what the Binary Fitted Line Plot looks like!
The red line has an upward trend, showing that more carries by the lead rusher leads to a higher probability of winning. But all of the probabilities are between 20% and 80%, so you can never be very certain in the probability that the team won or lost, no matter how many or few carries the lead rusher received. And in our dataset, 50% of the data are between 11 and 20 carries. We see from the graph that the probabilities in that range fall between about 40% and 60%. That means for the half of our data, the model is pretty much just guessing which team won or lost.
The individual observations (blue dots) help confirm this. You’ll notice that that both winning teams and losing teams appear to have about the same range of carries by the lead rusher. If our model fit the data well, we would expect to see higher values of carries grouped at the top right of the graph and lower values grouped at the bottom left (like in our income vs. vacation home plot).
This plot gives us a great visualization of what the statistics told us. Yes, there is a relationship between the two variables, but you’re going to do a really poor job predicting future outcomes based solely on that relationship.
Now let’s move on to our plot of team rushing attempts vs. wins. We found that there was also a significant relationship between these variables, and this time the goodness-of-fits tests said that the model fit the data well. Let’s see how we can use the binary fitted line plot to confirm those findings.
Here we can see why the model fits the data better when we use team rushing attempts. When a team rushes 15 times or fewer, we can be pretty confident they’ll lose. Meanwhile, we can almost be certain a team wins if they rush 35 times or more. And remember how half of the data fell in the 40% to 60% probability range in the previous plot? Well here that range only includes about 25% of the data (teams having between 24 and 29 rushing attempts). So this model is doing a lot less “guessing” than the previous one.
The underlying statistics will always be needed to make decisions when it comes to statistical analyses. But graphs and plots are great ways to visualize the results and get a better understanding of the conclusions you reached with the statistics. Now that Minitab offers Binary Fitted Line plots, you have even more powerful tools at your disposal. So go ahead and plot away!