They Call Them "Free" Throws For a Reason

looking at free throws with regression analysisWhen Penn State guard Jermaine Marshall stepped to the line to take two free throws with 0:27 remaining against Ohio State, it didn’t really matter whether he made the shots. The game was already out of reach, and although the Nittany Lions would attempt to foul their way into a miracle victory, most of the fans were all too aware that Penn State was now 0-8.  That Marshall then missed both free throws was the exclamation point on a night where the team made just 13 of 22 free throw attempts.

Lest you not already know this, a "free throw" is a shot taken against no defense, a shot that likely had been practiced thousands of times by Jermaine and every other player on the court.

You can hardly blame a fan for wondering how a team could perform so poorly at such a shot, and for suspecting that if a team struggles at that shot, it’s a sure sign that they must struggle in general. At least that’s what I assume the emotions were when my colleague John T. (if you know a John T. that works at Minitab, that’s him.  We only have one.) sent the following email to me soon after the game:

I have a blog idea - is there a correlation between free throw percentage and winning percentage in college basketball?  Thought March might be a good time for a post like this.

I am constantly amazed by how bad Penn State is at free throws.  I think it is a basic skill that any basketball player at a collegiate level should have.

He adds “They call them free for a reason.”

Now, before answering John’s question, I do want to reiterate that this was sent after Penn State’s 13-of-22 night—their worst free-throw performance in almost two months. I’m just saying.

Back to the email.  The problem it raises has three layers, which I will address using Minitab Statistical Software:

  1. Is Penn State really bad at free throws?
  2. Is there a relationship between free throw percentage and winning percentage?
  3. If we account for a team’s strength of schedule (based on RPI), do we see a relationship between free throw percentage and winning percentage?

The first question posed is quite easy to answer.  Here is the distribution of free throw percentages at this point in the season, with Penn State shown in blue:

Individual Value Plot

Penn State is pretty much in the middle, with 68.9%.  Ohio State is not highlighted but is a little lower at 68.4%.  (Again, I’m just saying.)

Regression Analysis of Winning Percentage and Free Throw Percentage

The next question is a little more involved since we have to do regression, but it's nothing too scary.  I ran a Fitted Line Plot with winning percentage as the response and free throw percentage as the predictor, and included terms through cubic, all of which were statistically significant. Here is the plot, again with Penn State highlighted in blue:

Fitted Line Plot

The analysis shows that there is actually a statistically significant cubic relationship, though the R-Sq and R-Sq(adj) are quite low.  But the simplest answer to John’s question is that yes, there is a relationship.

Sequential Sum of Squares

The third question I posed is important to consider because strength of schedule is a very important factor in determining winning percentage—and because it gives me an excuse to demonstrate functionality in Minitab that many don’t give much thought to: sequential sum of squares.  If we were to just put strength of schedule (SOS) and free throw percentage into Regression, each would be given equal weight and the order in which we enter terms is not particularly important.  However, by choosing to use sequential sum of squares, the first term listed is added to the model and sum of squares is calculated as usual.  When another term is added, only the additional sum of squares that term explained is credited to it.

Let me explain a little more. Suppose you have two predictors that have some correlation—X1 and X2.  You create a model with only X1, which is significant and results in an R-Sq of 50%.  When you add X2, both predictors are significant and the R-Sq increases to 55%.  Using adjusted sum of squares (the default), both terms may have similar sum of squares values and be significant, because the variation in the response can be attributed to either factor and the amount attributed will be based strictly on fit. 

But using sequential sum of squares, X2 will have a much smaller value and may not even be significant because it only accounted for a fraction of additional sum of squares attributed to the model relative to what was already there (hence only a small increase in R-Sq). So you can interpret the significance of X2 as whether it has a relationship to the response after X1 has been accounted for.

Returning to basketball, I first enter SOS and SOS*SOS in the model, followed by free throw percentage (I included squares and cubic terms at first but removed them as they were not significant).  And here is the result:


The simple answer to the third question is that even accounting for strength of schedule, free throw percentage is related to winning percentage, and in the direction one might guess: higher free throw percentages correspond to higher winning percentages.

Fits vs. Residuals

So, if free throw percentage is related to winning percentage and Penn State is actually fairly average in that regard, why is John so frustrated?  To answer that, we simply need to look at the fits versus residuals:

Residuals versus Fits

Based on strength of schedule and free throw percentage, we would expect Penn State to have about a 63% winning percentage...but instead they stand at only 40%. 

I’m just saying.


blog comments powered by Disqus