How Residuals Can Save You Thousands of Dollars on Your Next Car Purchase

Purchasing a used car can be stressful due to all the factors that need to be considered. Web sites such as www.cars.com provide you a wealth of information, but how do you navigate through it all to find the best deal?

Minitab to the rescue. Once you narrow your choice down to a particular car model, such as an Acura TSX, the data from www.cars.com can be copied and pasted into Minitab. After some data manipulation, you can use a regression analysis to develop an equation that calculates the expected list price of a vehicle based on variables such as year, mileage, whether or not the technology package is included, and whether or not a free Carfax report is included (which is possibly an indicator of how confident the seller is in the vehicle).

A Regression Model for Used Car Price

Let's apply this idea to an Acura TSX, using data for 986 cars downloaded from www.cars.com on 7/24/2013.  If you'd like to do this analysis yourself, download the data (and a free 30-day trial of our statistical software, if you don't already have it). 

After you've opened the data in Minitab, choose Stat > Regression > General Regression and fill out the dialog box like this:

general regression factors

Below is the regression model Minitab fits to this data.

regression analysis of used car prices

What Does Regression Analysis Tell Us About the Price of Used Cars?

Several interesting findings come from this regression analysis:

  • Every mile that is added to the car decreases the expected list price by approximately 6 cents.
  • Each year that is added to the car's age decreases the expected list price by approximately $1310.
  • The technology package adds, on average, approximately $1044 to the list price of the car.
  • Cars with a free Carfax report have a list price, on average, approximately $441 more than those with a paid report. A Carfax report only costs $40, so this increased price is likely due to the fact that the car has a clean report (or else they probably wouldn’t provide it for free!).
  • An impressive 89.8% of the variation in car list price is explained by these predictors.

Those findings are interesting, but the main focus of this analysis is to find the car that has the best value. In other words, the car that has the largest difference between the actual list price and the expected list price. The residuals from the regression analysis contain this exact information.

Finding the Price Difference in a Residuals Plot

To get the residuals plot from this analysis, rerun the analysis (you can just hit Ctrl-E on your keyboard to bring up the last dialog box used, which should be the General Regression dialog shown above). Then click on Graphs, and check the box for "Normal plot of residuals" so it looks like this:

residuals plot

Press OK, run the analysis, and you'll get the plot shown below. 

This probability plot of the residuals indicates that three cars have an unusually large difference between the actual list price and the expected list price. They are underpriced by $7,500 to $10,000.

normal probability plot

Unfortunately, two of those cars have severe damage.

Damaged Cardamaged car 1

After removing the two damaged cars from the analysis, one car is clearly priced better than the other 983 cars. There's our best value. 

plot of car prices

This car appears to have no damage. The www.cars.com description is below.

Description of Car

In summary, there is a little bit of work getting the data from www.cars.com into Minitab in an analysis-ready format, but the effort will reveal the best-value cars, resulting in potential savings in the thousands of dollars.


Name: Paul Bullock • Thursday, August 8, 2013

Very neat stuff. Will be car shopping in the next several months, so thanks for this!!

Name: Amy • Thursday, August 8, 2013

I'm trying to open the data file to see how you categorized the data so that you could perform the regression and it isn't letting me open it with my minitab. How did you transpose the categorical data into regressionable data?

Name: Amy • Thursday, August 8, 2013

Nevermind I figured it out

Name: M. Samir Anany • Friday, August 9, 2013

Great article! Thanks! I just have one question, how did you manage to download the data through www.cars.com? could you send me the link or at least explain how?
Thank you so much!

Name: Jim Colton • Monday, August 12, 2013

Great question...importing and formatting the data is often the most difficult task in a data analysis project. This is definitely the case here.
The exact steps for all the variable are too detailed to list out in this comment, but below are the steps that allow you to store the mileage, which should get you started. We plan on write a blog on how to import and clean datasets like this in the future.
1. At www.cars.com, generate a list of vehicles you want to analyze. Hit Ctrl+A and then Ctrl+C to highlight and copy all the information on the page.
2. In Minitab, put your cursor in row 1 of an empty worksheet and hit Ctrl+V. You will get a pop-up, leave “Use spaces as delimiters” selected and click OK.
3. Delete the rows, starting at the top of the Minitab worksheet, until you get to the first row that contains data for a car (approx. two hundred rows ).
4. Delete the data in columns c6 and beyond. You shouldn’t need this data.
5. Put your cursor in C1 and hit the End key. In this last row, enter a space in C1, C2, C3, C4, and C5. This should make all the columns equal length, which is important for the next step.
6. Go to Data > Copy > Columns to Columns:
- In Copy from Columns, enter C1
- In Store, Copied Data in Columns, select the last option and enter C10 in the box below.
- Click on Subset the Data and then click on Condition. Enter C2="mi."
- Click OK three times.
7. Finally, go to Data > Change Data Type > From Text to Numeric and enter C10 in both boxes and click OK.

These seven steps get you the mileage. Fortunately, you can put these step in a macro and run it quickly in the future. You can also tweak the macro to get other variables, such as year and price.

Name: Mike Costantini • Tuesday, September 17, 2013

This is a great article, and an excellent real-world application of statistics. I wish my high school had Minitab!

I did not download the data, however I presume the "Cost" is really the asking price, not the agreed-to sale price. Since there is some negotiating room here I would use this number as a place to start planning my initial offer.

blog comments powered by Disqus