Section 4.4: Contingency Tables and Association
Objectives
By the end of this lesson, you will be able to...
- compute the marginal distribution of a variable
- construct a conditional distribution of a variable
- use the conditional distribution to identify association between categorical data
For a quick overview of this section, watch this short video summary:
In sections 4.1-4.3, we studied relationships between two quantitative variables. We learned that we could quantify the strength of the linear relationship with the correlation.
What about qualitative (categorical) variables, though? For example, suppose we consider a survey given to 82 students in a Basic Algebra course at ECC, with the following responses to the statement "I enjoy math."
Strongly Agree |
Agree | Neutral | Disagree | Strongly Disagree |
|
Men | 9 | 13 | 5 | 2 | 1 |
Women | 12 | 18 | 11 | 6 | 5 |
How do we study this relationship? Is there a way to tell if gender and whether the student enjoys math? In fact, there is! Like usual, though, we need a bit of background work first.
Contingency Tables
A contingency table relates two categories of data. In the example above, the relationship is between the gender of the student and his/her response to the question.
A marginal distribution of a variable is a frequency or relative frequency distribution of either the row or column variable in the contingency table.
Example 1
If we consider the previous example:
Strongly Agree |
Agree | Neutral | Disagree | Strongly Disagree |
|
Men | 9 | 13 | 5 | 2 | 1 |
Women | 12 | 18 | 11 | 6 | 5 |
The entire table is referred to as the contingency table.
The marginal distribution for gender removes the effect of whether or not the student enjoys math:
Strongly Agree |
Agree | Neutral | Disagree | Strongly Disagree |
Total | |
Men | 9 | 13 | 5 | 2 | 1 | 30 |
Women | 12 | 18 | 11 | 6 | 5 | 52 |
Whereas, the marginal distribution for whether or not the student enjoys math removes the effect of gender:
Strongly Agree |
Agree | Neutral | Disagree | Strongly Disagree |
|
Men | 9 | 13 | 5 | 2 | 1 |
Women | 12 | 18 | 11 | 6 | 5 |
Total | 21 | 31 | 16 | 8 | 6 |
We can also create a relative frequency marginal distribution, which, as expected, is simply relative frequencies rather than frequencies.
Example 2
The combined relative frequency marginal distributions would look like this:
SA | A | N | D | SD | Total | |
Men | 9 | 13 | 5 | 2 | 1 | 30/82 ≈ 0.37 |
Women | 12 | 18 | 11 | 6 | 5 | 52/82 ≈ 0.63 |
Total | 21/82 ≈ 0.26 |
31/82 ≈ 0.39 |
16/82 ≈ 0.20 |
8/82 ≈ 0.10 |
6/82 = 0.07 |
1 |
Technology
Here's a quick overview of how to create a contingency table in StatCrunch.
|
Now let's consider the frequency marginal distributions from Example 2.
Example 3
SA | A | N | D | SD | Total | |
Men | 9 | 13 | 5 | 2 | 1 | 30 |
Women | 12 | 18 | 11 | 6 | 5 | 52 |
Total | 21 | 31 | 16 | 8 | 6 | 82 |
We might now be interested in comparing the two variables. For example:
- What proportion of women strongly agreed with the statement "I enjoy math"?
- What proportion of women disagreed?
- What proportion of men were neutral?
- What proportion of men strongly agreed?
Solution:
- There were 12 women who strongly agreed, and 52 women in all, so 12/52 ≈ 0.23
- Similarly, there were 6 women who disagreed, and 52 overall, so 6/52 ≈ 0.12
- 5/30 ≈ 0.17
- 9/30 ≈ 0.30
If we completed the table in this fashion, we get something called a conditional distribution.
A conditional distribution lists the relative frequency of each category of variable, given a specific value of the other variable in the contingency table.
For another explanation of marginal and conditional distributions, watch this YouTube video:
Example 4
The conditional distribution of how the students feel about math by gender would be as follows:
SA | A | N | D | SD | Total | |
Men | 9/30 ≈ 0.30 |
13/30 ≈ 0.43 |
5/30 ≈ 0.17 |
2/30 ≈ 0.07 |
1/30 ≈ 0.03 |
30/30 = 1 |
Women | 12/52 ≈ 0.23 |
18/52 ≈ 0.35 |
11/52 ≈ 0.21 |
6/52 ≈ 0.12 |
5/52 ≈ 0.10 |
52/52 = 1 |
Note: The row totals sometimes do not add up to 1 due to rounding.
Another way to think of this distribution is that it's the distribution of how students feel for each gender. That's what the "by gender" indicates.
Example 5
The conditional distribution of gender by how the student feels would be:
SA | A | N | D | SD | |
Men | 9/21 ≈ 0.43 |
13/31 ≈ 0.42 |
5/16 ≈ 0.31 |
2/8 = 0.25 |
1/6 ≈ 0.17 |
Women | 12/21 ≈ 0.57 |
18/31 ≈ 0.58 |
11/16 ≈ 0.69 |
6/8 = 0.75 |
5/6 ≈ 0.83 |
Total | 21/21 = 1 |
31/31 = 1 |
16/16 = 1 |
8/8 = 1 |
6/6 = 1 |
Technology
Here's a quick overview of how to create a conditional relative frequency distribution in StatCrunch.
* This step is key. The choice depends on what you are looking for the distribution of. These problems are typically phrased “find the conditional relative frequency distribution of X by Y”. This means you want to know how X is distributed for the different categories of Y. If X is your row variable and Y is your column variable, then you want to show Column percent, because then each column will show the distribution of X, the row variable. |
Using Conditional Distributions to Identify Association
One thing we can use conditional distributions for is to identify an association between qualitative variables. The best way to do this is a side-by-side bar graph. We'll illustrate with the same data we've been using.
Example 6
In Example 5, we found the conditional distribution of gender by how the student feels regarding math:
SA | A | N | D | SD | |
Men | 9/21 ≈ 0.43 |
13/31 ≈ 0.42 |
5/16 ≈ 0.31 |
2/8 = 0.25 |
1/6 ≈ 0.17 |
Women | 12/21 ≈ 0.57 |
18/31 ≈ 0.58 |
11/16 ≈ 0.69 |
6/8 = 0.75 |
5/6 ≈ 0.83 |
Total | 21/21 = 1 |
31/31 = 1 |
16/16 = 1 |
8/8 = 1 |
6/6 = 1 |
Since it's difficult to gain much from this table alone, a good way to analyze this would be to make a side-by-side bar graph.
From the graph, we can see that there definitely appear to be some differences between the different responses. The proportions of responses were similar for both "Strongly Agree" and "Agree", but very different for "Neutral" and "Disagree". As we go down the scale, the proportion of the responses that are by women increases.
We might conclude, then, that men tend to enjoy math more than women.
One thing we can't conclude is that their gender caused them to not enjoy math. We've only done an observational study, so we can only claim association, not causation.
One question you might have as a result of this is, "How do we know when it's different enough from equal to say that there might be a relationship?" It's a very good question. In order to draw a fine line, we'll need a hypothesis test, which we won't see until we get to Chapter 12.
Technology
Here's a quick overview of how to create a conditional relative frequency distribution in StatCrunch.
This graph is designed to match the conditional relative frequency distribution above, where you are asked to show the conditional relative frequency distribution of X by Y.
|