# How Data Analysis Can Help Us Predict This Year's Champions League

by Laerte de Araujo Lima, guest blogger

A few weeks ago, my football friends and I were talking about the football in the UEFA Champions league (UEFA CL), and what we could expect for the 2013-14 season.

Some of us believe that the quality of the football played in the UEFA CL has improved in the last few years, as evidenced by more goals per match, more teams with strategies based in the attack and, finally, more show games. Others disagree, arguing that the teams were pursued defensive strategies with consequently fewer goals per match, more faults per game, and less effective use of game time (due to minutes wasted when the game was stopped for a fault), making for more and more boring games....

As a Black Belt and Six Sigma addict, I decided to perform a data analysis in order to assess the real state of UEFA CL football tendencies. So let’s set the subjectivity aside, look at the data in Minitab, and see what we can expect for the upcoming season.

## The UEFA CL Championship for Dummies

Each season, 32 UEFA clubs are split into eight groups of four teams each, who play home and away against each of their pool opponents between September and December. Two teams from each section advance to the first knockout round.

From the last 16 games until the semi-finals, clubs play two matches against each other on a home-and-away basis, following the same rules as the qualifying and play-off matches.

The final is decided by a single match

## UEFA CL Data Analysis

The UEFA Champions League web site is one of the best football databases in the world. Every year, new information is available and dozens of statistics data can be accessed.

For this analysis, I selected the last 12 seasons (from season 2001-02 to 2012-13) of UEFA CL. I selected this range because the championship format has been consistent since the 2001-02 season, with the same number of teams in the group stage.

Based my friends’ suggestion, we selected the data that are linked with the “Subjective” definition of quality in the football. In Six Sigma terms, we’d call these CTQ (critical to quality).

 CTQ – Voice of Customer UEFA CL database variable associated with CTQ More goals per game to make game more fun! ↑ Average goals scored per game Offensive strategy, with more attempts to score goals. ↑ Average attempts on target per game ↑ Average goals scored per game ↓ Average fouls committed per game More effective use of game time. ↓ Average fouls committed per game More “fair play” and protection for players with high football skills. ↓ Average fouls committed per game

With our CTQ and the variables that impact the CTQ defined, we’re ready to look at the data with Minitab Statistical Software.

First, let’s look at the last 12 season’s finalists and winners by country, using the Minitab Pareto Chart command (Stat > Quality Tools > Pareto Chart).

The charts show that 90% of the UEFA CL finalists come from 4 countries (Germany, Spain, England and Italy). German teams are less efficient (they reach 20.8% of the finals but only win 8.3%). Spanish teams are the most efficient (they reach just 16.7% of the finals, but  win 33.3%).

To see what happened with the variables (average goals scored per game, average attempts on target per game, and average fouls committed per game) over the seasons, we used the Minitab command Graph > Time Series Plot.

The time series plots give let us see trends in the data over time. A couple of things are clear from the plot shown above:

• Average goals per game do not seem to have increased or decreased from 2001-02 through 2012-13).
• The average attempts on target also do not appear to have risen or fallen since 2001-02, except for the 2012-13 season.
• The average faults per game seem to have decreased since 2001-02.

The time series plots give us the initial indications of our analysis, but let’s fine-tune it with some other statistical methods. To make sure that the trends show in the time series are statistically significant, we’ll perform a correlation analysis using Stat > Basic Statistics > Correlation.

Regarding the variables’ correlation to the seasons, we can make the following statements.

• Season (end year) x  Fault per game :
• The p-value (0,000) being less than the threshold ( < 0.05) gives statistical evidence that there is a relationship between those two variables. Additionally, the strong value of the Pearson correlation coefficient (-0.928) indicates that a strong linear-fitting distribution can be found to correlate both variables—in this case, a negative correlation.

• Season (end year) x goals per game  & Season (end year) x  attempts on target
• Since the p-value is greater than 0.05 for both, there is no statistical evidence  of a relationship between these two pairs of variables.

## Regression Analysis: Minitab’s Assistant Makes It Easy

Now that we know there’s a significant correlation between faults per game and season, we can learn even more by doing a regression analysis. This analysis can give us insight into what we might anticipate for upcoming seasons.

To perform the regression analysis, we'll use the Minitab Assistant. To make our lives easy, we can let Minitab select the best model for our regression by picking “Choose for me” in the Assistant’s regression dialog box:

Minitab will automatically choose between three types of model: Linear, Quadratic and Cubic. The associated alpha level for eliminating the null hypothesis is set at 0.05 by default, and that level is fine for this analysis.

Minitab selected a Quadratic model as the best fitting regression analysis for Average fault x UEFA CL Season (end year), with an astonishingly high R-sq of 91.75%!

The regression model is shown as Y = - 135252 + 135, 1 X - 0, 03374 X**2, but we can make it more clear by defining the Y and X variables:

Based on this analysis, we can expect to see an average  of 16.65 faults per game for 2013-14 UEFA CL season.

## Conclusions:

Based on this analysis of data from the UEFA CL's past 12 seasons, we can make the follow conclusions:

• UEFA CL might want to consider changing its name to “Italy, England, and Spain CL,” because the teams from those countries account for more than 80% of winning teams in the last 12 seasons.

• Spanish teams are the most effective and the German teams are the least (that is, excluding the Portuguese and French teams, each of whom only appear once in the 12 finals we looked at).

• The average faults per game have decreased over the last 12 seasons. We might infer that the athletes in the UEFA CL have become fairer players over the years. Also, if fewer fouls happen (and consequentially, there are fewer interruptions during the game), we can expect to see more effective use of game time. The analysis lets us predict an average of 16.65 faults per game in upcoming season.

Based on the CTQ points defined in the beginning of this analysis, I conclude that the UEFA CL has become fairer, and has steadily been improving time efficiency over the last 12 seasons.

Unfortunately, I cannot conclude that the games are more fun, since the variables related to those CTQ factors have not increased or decreased over the past 12 seasons.

But on the bright side, this means me and my football-loving buddies will still have plenty to debate as we enjoy watching the next season!