Do We All Really Want Violence on TV? A Study using Game of Thrones Data, Part 2

Eugenie Chung 18 February, 2016
In my last post, I looked at viewership data for the five seasons of HBO’s hit series Game of Thrones. I created a time series plot in Minitab that showed how viewership rose season by season, and how it varied episode by episode within each season.
My next step is to fit a statistical model to the data, which I hope will allow me to predict the viewing numbers for future episodes. 
I am going to use the General Linear Model to analyse the data. This is because our variables include the season (1 to 5) and episode number (1 to 10), which are fixed and can be considered as categorical. In addition, we’ll consider the number of important characters who die in each episode as a covariate.
Under Stat > ANOVA > General Linear Model > Fit General Linear Model…, I fill in the dialog box as shown below. 
Then I click the “Model” button and tell Minitab to analyse the main effects of these variables and their interactions:
Next, I click “OK” to return to the first dialog box, and then click the “Stepwise” button to bring up this dialog box: 
By selecting the stepwise method, I’m telling Minitab to fit the most suitable model with the data given without me having to try various combinations of the predictors to obtain the final model Press OK in this dialog and the first, and Minitab returns the results. 
That seems like a lot of information to sort through, so we’ll break it down into its important components.
First, let’s  look at the Analysis of Variance table. It shows you the effect of the terms in the model on the response. Using the p-value, we can determine whether the effect is significant. The general guideline is to use 0.05. If the p-value is smaller than 0.05, the effect of the term on the response is significant. The number of major deaths (p-value of 0.014) and season (p-value of 0) are both significant! 
Next, looking at the model summary section, the R-square values are quite high, more than 90%. The R-square indicates the proportion or percentage of variation in the response that can be explained by the predictors. The higher the value, the better the model. 
Well, it turns out I have quite a good model here!
Now moving on to the most important part of the output, we have the regression equation, which is an algebraic representation of the regression line and describes the relationship between the response and predictor variables.  Since the predictor “season” is categorical, I have one equation for each season: 
Looking at the equations, you will notice that all 5 equations have positive coefficients for the predictor “no of major death” with a positive constant. In other words, based on these equations, we can infer that as the number of major deaths increases, the viewing numbers will also increase. It would seem the show’s audience does have some appetite for violence. However, there are still exceptions.
In episodes 8 and 9 of season 1, there were 7 and 2 deaths, respectively (including Ned Stark’s execution in episode 9). The corresponding viewing numbers for those episodes were 2.72 and 2.66. However, the first season finale, with only 3 deaths, had a higher viewing number. This could be because of the storyline—this is the episode where Daenerys Targaryen became the “Mother of Dragons,” a key event in the books. 
Since we now have a model for the data, let’s use Stat > ANOVA > General Linear Model > Predict… to get some fitted values for the data. We can then compare these with the observed values in our original data set. Fill in the dialog box as shown below. 
Below are the screenshots from some of the results. The fitted values are stored in the worksheet. 
Apart from the fitted value for each row of data, you will also see the standard error, 95% confidence interval and 95% prediction interval for each fitted value. The confidence interval is the range in which the estimated mean response for a given set of predictor values is expected to fall. The prediction interval is the range in which the predicted response for a single observation with a given set of predictor values is expected to fall. Now let’s make some comparisons. 
Season 3, episode 9 features the pivotal event “The Red Wedding.” With  8 key deaths, the highest number of casualties up to that point in the series, this episode had 5.22 million viewers. For this episode, our model delivered the following fitted values statistics. 
The model slightly overestimates the viewing numbers (5.37 vs. 5.22). However, looking at the 95% CI and 95%PI, the data we observed falls within these intervals, which indicates a reasonably good model. 
Another episode in the series with high number of casualties is season 4, episode 9, which saw 10 deaths and captured 6.95 million viewers. For this episode, the model offers the following fitted values statistics:
The model may slightly overestimate the viewing numbers (7.30 vs 6.95). However, looking at the 95% CI and 95%PI, the data we observed falls within these intervals, which indicates a reasonably good model.
If we return to the output of the model, we can see that Minitab diagnostic results flagged some data points as unusual: 
It appears that the unusual observations are all related to the data from the most current season. The figures are summarized below:
obs  data  Fit 95% CI 95% PI No. of Deaths
41 8 6.807 6.5295, 7.08479 5.90426, 7.71002 2
47  5.4  6.807  6.5295, 7.08479  5.90426, 7.71002 2
50  8.11 7.171  6.81615, 7.52672  6.24174, 8.10113 7

Overall, the model provides a reasonable range of viewing numbers for many of the episodes in the series apart from episodes 1, 7, and 10 of season 5, where we see big discrepancies between fitted and observed data. The large viewing numbers for episode 1 require no explanation—there is a lot of anticipation and excitement among viewers for the opening episode of every season. The drop in viewing numbers for episode 7, as I noted previously, was likely due to some controversial scenes in the previous episode. And this season’s finale was actually a record-breaking episode, achieving the series’ highest viewing numbers so far. 

While this model is not perfect, it does, to some degree, suggest that we Game of Thrones viewers do have some appetite for violence on TV. "Winter is coming," and we are all still speculating about the fate of Jon Snow (a key character who appears to be killed at the end of season 5). 
However, one thing I know for sure is that to get record-breaking viewing numbers, all they need to do is to have a certain white-haired lady sitting on the Iron Throne!   We’ll have to wait for season 6 to see if that transpires. In the meantime, I hope you’ve enjoyed this analysis of the viewership data for the first five seasons!