# Section 4.4: Contingency Tables and Association

## Objectives

By the end of this lesson, you will be able to...

1. compute the marginal distribution of a variable
2. construct a conditional distribution of a variable
3. use the conditional distribution to identify association between categorical data

For a quick overview of this section, watch this short video summary:

In sections 4.1-4.3, we studied relationships between two quantitative variables. We learned that we could quantify the strength of the linear relationship with the correlation.

What about qualitative (categorical) variables, though? For example, suppose we consider a survey given to 82 students in a Basic Algebra course at ECC, with the following responses to the statement "I enjoy math."

 Strongly Agree Agree Neutral Disagree Strongly Disagree Men 9 13 5 2 1 Women 12 18 11 6 5

How do we study this relationship? Is there a way to tell if gender and whether the student enjoys math? In fact, there is! Like usual, though, we need a bit of background work first.

## Contingency Tables

A contingency table relates two categories of data.  In the example above, the relationship is between the gender of the student and his/her response to the question.

A marginal distribution of a variable is a frequency or relative frequency distribution of either the row or column variable in the contingency table.

Example 1

If we consider the previous example:

 Strongly Agree Agree Neutral Disagree Strongly Disagree Men 9 13 5 2 1 Women 12 18 11 6 5

The entire table is referred to as the contingency table.

The marginal distribution for gender removes the effect of whether or not the student enjoys math:

 Strongly Agree Agree Neutral Disagree Strongly Disagree Total Men 9 13 5 2 1 30 Women 12 18 11 6 5 52

Whereas, the marginal distribution for whether or not the student enjoys math removes the effect of gender:

 Strongly Agree Agree Neutral Disagree Strongly Disagree Men 9 13 5 2 1 Women 12 18 11 6 5 Total 21 31 16 8 6

We can also create a relative frequency marginal distribution, which, as expected, is simply relative frequencies rather than frequencies.

Example 2

The combined relative frequency marginal distributions would look like this:

 SA A N D SD Total Men 9 13 5 2 1 30/82 ≈ 0.37 Women 12 18 11 6 5 52/82 ≈ 0.63 Total 21/82 ≈ 0.26 31/82 ≈ 0.39 16/82 ≈ 0.20 8/82 ≈ 0.10 6/82 = 0.07 1

Let's consider the frequency marginal distributions from Example 2.

Example 3

 SA A N D SD Total Men 9 13 5 2 1 30 Women 12 18 11 6 5 52 Total 21 31 16 8 6 82

We might now be interested in comparing the two variables. For example:

1. What proportion of women strongly agreed with the statement "I enjoy math"?
2. What proportion of women disagreed?
3. What proportion of men were neutral?
4. What proportion of men strongly agreed?

Solution:

1. There were 12 women who strongly agreed, and 52 women in all, so 12/52 ≈ 0.23
2. Similarly, there were 6 women who disagreed, and 52 overall, so 6/52 ≈ 0.12
3. 5/30 ≈ 0.17
4. 9/30 ≈ 0.30

If we completed the table in this fashion, we get something called a conditional distribution.

A conditional distribution lists the relative frequency of each category of variable, given a specific value of the other variable in the contingency table.

For another explanation of marginal and conditional distributions, watch this YouTube video:

Example 4

The conditional distribution of how the students feel about math by gender would be as follows:

 SA A N D SD Total Men 9/30 ≈ 0.30 13/30 ≈ 0.43 5/30 ≈ 0.17 2/30 ≈ 0.07 1/30 ≈ 0.03 30/30 = 1 Women 12/52 ≈ 0.23 18/52 ≈ 0.35 11/52 ≈ 0.21 6/52 ≈ 0.12 5/52 ≈ 0.10 52/52 = 1

Note: The row totals sometimes do not add up to 1 due to rounding.

Another way to think of this distribution is that it's the distribution of how students feel for each gender. That's what the "by gender" indicates.

Example 5

The conditional distribution of gender by how the student feels would be:

 SA A N D SD Men 9/21 ≈ 0.43 13/31 ≈ 0.42 5/16 ≈ 0.31 2/8 = 0.25 1/6 ≈ 0.17 Women 12/21 ≈ 0.57 18/31 ≈ 0.58 11/16 ≈ 0.69 6/8 = 0.75 5/6 ≈ 0.83 Total 21/21 = 1 31/31 = 1 16/16 = 1 8/8 = 1 6/6 = 1

## Using Conditional Distributions to Identify Association

One thing we can use conditional distributions for is to identify an association between qualitative variables. The best way to do this is a side-by-side bar graph. We'll illustrate with the same data we've been using.

Example 6

In Example 5, we found the conditional distribution of gender by how the student feels regarding math:

 SA A N D SD Men 9/21 ≈ 0.43 13/31 ≈ 0.42 5/16 ≈ 0.31 2/8 = 0.25 1/6 ≈ 0.17 Women 12/21 ≈ 0.57 18/31 ≈ 0.58 11/16 ≈ 0.69 6/8 = 0.75 5/6 ≈ 0.83 Total 21/21 = 1 31/31 = 1 16/16 = 1 8/8 = 1 6/6 = 1

Since it's difficult to gain much from this table alone, a good way to analyze this would be to make a side-by-side bar graph.

From the graph, we can see that there definitely appear to be some differences between the different responses. The proportions of responses were similar for both "Strongly Agree" and "Agree", but very different for "Neutral" and "Disagree". As we go down the scale, the proportion of the responses that are by women increases.

We might conclude, then, that men tend to enjoy math more than women.

One thing we can't conclude is that their gender caused them to not enjoy math. We've only done an observational study, so we can only claim association, not causation.

One question you might have as a result of this is, "How do we know when it's different enough from equal to say that there might be a relationship?" It's a very good question. In order to draw a fine line, we'll need a hypothesis test, which we won't see until we get to Chapter 12.

<< previous section | next section >>