In Minitab Statistical Software, putting a regression line on a scatterplot is as easy as choosing a picture with a regression line on a scatterplot:
A neat trick is that you can also add calculated lines onto a scatterplot for comparison or other communication purposes. Here’s a demonstration.
United States Sentencing Guidelines
The United States Sentencing Guidelines say how people who are convicted of crimes should be punished. Sentencing can vary from the guidelines. How often deviations happen either more severely or less severely are some of the statistics that the United States Sentencing Commission keeps. If we were to look at it simply, one thing that we might expect is that the amount of money spent on fines and restitution has a relationship with the measured monetary loss that results from a crime, at least in cases where the recorded statistics include a monetary loss.
The raw data from the United States Sentencing Comission for 2013, the most recent year on their website as of 2/16/2015, has 80,035 cases. Cut that data set down to cases where a specific, nonzero amount was recorded for a monetary loss and a specific amount was recorded for the total of fines, restitution, and cost of supervision and you get a data set with 9,619 cases. Here’s what the scatterplot with a regression line for that data set looks like:
If there’s a relationship between the cost and the loss, we might hypothesize that a fair solution would be for cost and loss to be approximately equal, Y = X. Here are the steps for drawing a new line on the scatterplot:
- In the worksheet, name an empty column X.
- Enter starting and ending x-values in the first two rows of column X. (Because I’m, going to show only a portion of the data, for now, I’m going to enter 0 in the first row and 400 million in the second row.)
- Choose Calc > Calculator.
- In Store Result in Variable, enter Y.
- In Expression, enter the formula for the calculated line. (In this case, because I’m interested in whether cost and loss are approximately equal, so I’m going to enter ‘X’.)
- Click OK.
- Right-click the scatterplot. Choose Add > Calculated Line.
- In Y Column, enter Y.
- In X Column, enter X.
- Click OK.
The single case where the loss was $5.9 billion and no restitution or fines were part of the sentence, as well as the other 5 cases where the loss exceeded $500 million seem to squish the main portion of the data considerably, so I edited the x-axis to extend only to 400 million.
The regression fit is well below the calculated line, which suggests that the costs tend to be less than the loss. However, the r-squared value for the regression line is 3.3%. What the data really indicate is that there's no linear relationship between the loss and the costs a criminal is asked to pay.
Of course, we know that the regression line fitting all of the data is heavily influenced by the most extreme case where the loss was $5.9 billion and there was no cost. Actually, the cost and the loss are identical in about 34% of the cases in the data. If we consider only cases where the costs a criminal paid were nonzero and the loss was less than $500 million, the r-squared value increases to 73.3% and the regression line looks much closer to the line Y = X:
The United States Sentencing Commission recorded over 18,000 variables about the sentences that defendants received in 2013. Coming up with what’s fair is clearly a complicated matter.
You can add calculated lines to all kinds of graphs in Minitab. If you’re ready for more, see how you can use a calculated line to put a line in front of the bars on a histogram.