Using Cross Tabulation and Chi-Square: The Survey Says...

Minitab Blog Editor | 05 October, 2012

Topics: Data Analysis, Statistics

Every day brings news about this poll or that survey and what it means. News reports about public opinion are particularly thick during election years, when it seems like barely an hour goes by without new poll results claiming to reveal the current state of the electorate.  

But a great deal of the insight that can be derived from political polling data—or any public opinion research, for that matter—rarely makes it into the press. One source of insight is the use of cross tabulation and Chi-Square analysis, which we can use to summarize observations by categories. For example, the news may report that a larger percentage of those surveyed prefer this or that candidate, but they less frequently drill down to report whether, for instance, there's an association between a certain characteristic and candidate preference.

But that's the kind of actionable insight that campaign consultants, advertisers, and almost anyone else who does a survey really wants to know. It also has applications in the quality improvement and customer service arenas, where this type of data analysis can reveal whether there's an association between, for example, customers' gender and the type of services they use. 

Statistical Association between Categorical Variables Like Gender and Political Affiliation

Cross-tabulation and chi square analysis data setup

To do a cross tabulation and Chi Square analysis, you'll need to collect data on some categorical variables.

For instance, political surveys frequently ask respondents to provide gender, educational level, and income bracket. If you're analyzing that data in Minitab Statistical Software, you'd give each categorical variable its own column, with each row representing one respondent in your survey. 

Let's say you're working on a campaign for the Bull Moose Party and you want to find out whether there's an association between gender and candidate preference. So you survey 400 randomly selected voters and enter your data so it looks like the columns you see on the right.  (Download the simulated data here if you'd like to play along.)  

To analyze this data, you choose Stat > Tables > Cross Tabulation and Chi-Square in Minitab. Minitab asks you select the variable that will correspond to the table's rows and the table's columns.  We'll choose "Gender" for rows and "Affiliation" for columns. 

If we click "OK" now, Minitab provides a table that shows the cross-tabbed counts of each variable: 

Tabulated Statistics

Eyeballing this table, it does look like there might be an association between the Bull Moose Party and the women surveyed. Interesting, but the count data alone can't tell us if there's really a statistical association between gender and party affiliation or not.  For that, we need to do the full Chi-Square analysis. 

Basic Chi-Square Analysis 

The Chi-Square test helps you determine if two discrete variables are associated. If there's an association, the distribution of one variable will differ depending on the value of the second variable. But if the two variables are independent, the distribution of the first variable will be similar for all values of the second variable.

Let's go back into the Stat > Tables > Cross Tabulation and Chi-Square dialog. This time, click on the Chi-Square... button. Check the options for Chi-Square analysis and Expected cell counts, then press OK, and OK again to run the analysis. Minitab gives the following output: 

Chi-Square Analysis Output

So...what does it mean?  Minitab performs Pearson and likelihood ratio chi-square tests. Each chi-square test provides two statistics that indicate if the variables are associated or independent: a chi-square statistic and a p-value. The one to pay attention to is the p-value. Just compare this p-value to your alpha-level, which is commonly 0.05.
  • If the p-value is less than or equal to alpha, the variables are associated.
  • If the p-value is greater than alpha, you can conclude that they're independent.
For our simulated election survey data, the Pearson chi-square statistic is 9.894 (with a p-value of 0.007) and the likelihood chi-square statistic is 9.971 (which also gives a p-value of 0.007). So, with an alpha level of 0.05, we can conclude that there is a significant association between gender and party affiliation.

Beyond Chi-Square: Other Measures of Association

Of course, professional pollsters are going to dig much deeper into the data than this: they will want to know more about the strength of the association, and statistical software packages (like Minitab) give us easy access to other measures of association, including Cramer's V-square and Goodman-Kruskal lamba and tau statistics.