Section 4.4: Contingency Tables and Association

Objectives

By the end of this lesson, you will be able to...

compute the marginal distribution of a variable
construct a conditional distribution of a variable
use the conditional distribution to identify association between categorical data

For a quick overview of this section, watch this short video summary:

In sections 4.1-4.3, we studied relationships between two quantitative variables. We learned that we could quantify the strength of the linear relationship with the correlation.

What about qualitative (categorical) variables, though? For example, suppose we consider a survey given to 82 students in a Basic Algebra course at ECC, with the following responses to the statement "I enjoy math."

	Strongly Agree	Agree	Neutral	Disagree	Strongly Disagree
Men	9	13	5	2	1
Women	12	18	11	6	5

How do we study this relationship? Is there a way to tell if gender and whether the student enjoys math? In fact, there is! Like usual, though, we need a bit of background work first.

Contingency Tables

A contingency table relates two categories of data. In the example above, the relationship is between the gender of the student and his/her response to the question.

A marginal distribution of a variable is a frequency or relative frequency distribution of either the row or column variable in the contingency table.

Example 1

If we consider the previous example:

	Strongly Agree	Agree	Neutral	Disagree	Strongly Disagree
Men	9	13	5	2	1
Women	12	18	11	6	5

The entire table is referred to as the contingency table.

The marginal distribution for gender removes the effect of whether or not the student enjoys math:

	Strongly Agree	Agree	Neutral	Disagree	Strongly Disagree	Total
Men	9	13	5	2	1	30
Women	12	18	11	6	5	52

Whereas, the marginal distribution for whether or not the student enjoys math removes the effect of gender:

	Strongly Agree	Agree	Neutral	Disagree	Strongly Disagree
Men	9	13	5	2	1
Women	12	18	11	6	5
Total	21	31	16	8	6

We can also create a relative frequency marginal distribution, which, as expected, is simply relative frequencies rather than frequencies.

Example 2

The combined relative frequency marginal distributions would look like this:

	SA	A	N	D	SD	Total
Men	9	13	5	2	1	30/82 ≈ 0.37
Women	12	18	11	6	5	52/82 ≈ 0.63
Total	21/82 ≈ 0.26	31/82 ≈ 0.39	16/82 ≈ 0.20	8/82 ≈ 0.10	6/82 = 0.07	1

Technology

Here's a quick overview of how to create a contingency table in StatCrunch.

Select Stat > Table > Contingency
Select the Row and Column variables
Deselect the Chi-Square test for independence by holding CTRL and clicking it. (We will do this test later, but not yet.)
Click Compute.

Now let's consider the frequency marginal distributions from Example 2.

Example 3

	SA	A	N	D	SD	Total
Men	9	13	5	2	1	30
Women	12	18	11	6	5	52
Total	21	31	16	8	6	82

We might now be interested in comparing the two variables. For example:

What proportion of women strongly agreed with the statement "I enjoy math"?
What proportion of women disagreed?
What proportion of men were neutral?
What proportion of men strongly agreed?

Solution:

There were 12 women who strongly agreed, and 52 women in all, so 12/52 ≈ 0.23
Similarly, there were 6 women who disagreed, and 52 overall, so 6/52 ≈ 0.12
5/30 ≈ 0.17
9/30 ≈ 0.30

If we completed the table in this fashion, we get something called a conditional distribution.

A conditional distribution lists the relative frequency of each category of variable, given a specific value of the other variable in the contingency table.

For another explanation of marginal and conditional distributions, watch this YouTube video:

Example 4

The conditional distribution of how the students feel about math by gender would be as follows:

	SA	A	N	D	SD	Total
Men	9/30 ≈ 0.30	13/30 ≈ 0.43	5/30 ≈ 0.17	2/30 ≈ 0.07	1/30 ≈ 0.03	30/30 = 1
Women	12/52 ≈ 0.23	18/52 ≈ 0.35	11/52 ≈ 0.21	6/52 ≈ 0.12	5/52 ≈ 0.10	52/52 = 1

Note: The row totals sometimes do not add up to 1 due to rounding.

Another way to think of this distribution is that it's the distribution of how students feel for each gender. That's what the "by gender" indicates.

Example 5

The conditional distribution of gender by how the student feels would be:

	SA	A	N	D	SD
Men	9/21 ≈ 0.43	13/31 ≈ 0.42	5/16 ≈ 0.31	2/8 = 0.25	1/6 ≈ 0.17
Women	12/21 ≈ 0.57	18/31 ≈ 0.58	11/16 ≈ 0.69	6/8 = 0.75	5/6 ≈ 0.83
Total	21/21 = 1	31/31 = 1	16/16 = 1	8/8 = 1	6/6 = 1

Technology

Here's a quick overview of how to create a conditional relative frequency distribution in StatCrunch.

Select Stat > Table > Contingency
KEY: Display Row percent or Column percent*
Deselect the Chi-Square test for independence by holding CTRL and clicking it.
Click Compute.

* This step is key. The choice depends on what you are looking for the distribution of. These problems are typically phrased “find the conditional relative frequency distribution of X by Y”. This means you want to know how X is distributed for the different categories of Y. If X is your row variable and Y is your column variable, then you want to show Column percent, because then each column will show the distribution of X, the row variable.

Using Conditional Distributions to Identify Association

One thing we can use conditional distributions for is to identify an association between qualitative variables. The best way to do this is a side-by-side bar graph. We'll illustrate with the same data we've been using.

Example 6

In Example 5, we found the conditional distribution of gender by how the student feels regarding math:

	SA	A	N	D	SD
Men	9/21 ≈ 0.43	13/31 ≈ 0.42	5/16 ≈ 0.31	2/8 = 0.25	1/6 ≈ 0.17
Women	12/21 ≈ 0.57	18/31 ≈ 0.58	11/16 ≈ 0.69	6/8 = 0.75	5/6 ≈ 0.83
Total	21/21 = 1	31/31 = 1	16/16 = 1	8/8 = 1	6/6 = 1

Since it's difficult to gain much from this table alone, a good way to analyze this would be to make a side-by-side bar graph.

side-by-siide bar graph

From the graph, we can see that there definitely appear to be some differences between the different responses. The proportions of responses were similar for both "Strongly Agree" and "Agree", but very different for "Neutral" and "Disagree". As we go down the scale, the proportion of the responses that are by women increases.

We might conclude, then, that men tend to enjoy math more than women.

One thing we can't conclude is that their gender caused them to not enjoy math. We've only done an observational study, so we can only claim association, not causation.

One question you might have as a result of this is, "How do we know when it's different enough from equal to say that there might be a relationship?" It's a very good question. In order to draw a fine line, we'll need a hypothesis test, which we won't see until we get to Chapter 12.

Technology

Here's a quick overview of how to create a conditional relative frequency distribution in StatCrunch.

This graph is designed to match the conditional relative frequency distribution above, where you are asked to show the conditional relative frequency distribution of X by Y.

Select Graph > Bar Plot > With Data
Select variable Y (this is counter-intuitive, but you are graphing the grouping variable)
Select Group by: variable X (this is what you are finding the distribution of)
Change the Type to Relative frequency (within category) or Percent (within category).
Add an appropriate title and press Compute.