Section 11.1: Inference about Two Proportions

Objectives

By the end of this lesson, you will be able to...

test hypotheses regarding two population proportions
construct and interpret confidence intervals for the difference between two population proportions
determine the sample size necessary for estimating the difference between two population proportions within a specific margin of error

For a quick overview of this section, watch this short video summary:

In Chapters 9 and 10, we studied inferential statistics (confidence intervals and hypothesis tests) regarding population parameters of a single population - the average rest heart rate of students in a class, the proportion of ECC who voted, etc.

In Chapter 11, we'll be considering the relationship between two populations - means, proportions and standard deviaions.

A frequent comparison we want to make between to populations is concerning the proportion of individuals with certain characteristics. For example, suppose we want to determine if college faculty voted at a higher rate than ECC students in the 2008 presidential election. Since we don't have any information from either population, we would need to take samples from each. This isn't an example of a hypothesis test from Section 10.4, about one proportion, it'd be comparing two proportions, so we need some new background.

The information that follows is a bit heavy, but it shows the theoretical background for testing claims and finding confidence intervals for the difference between two population proportions.

The Difference Between Two Population Proportions

In Section 8.2, we discussed the distribution of one sample proportion, . What we'll need to do now is develop some similar theory regarding the distribution of the difference in two sample proportions, .

The Sampling Distribution of the Difference between Two Proportions

Suppose simple random samples size n₁ and n₂ are taken from two populations. The distribution of where and , is aproximately normal with mean and standard deviation
standard deviation of p1-p2 , provided:

both sample sizes are less than 5% of their respective populations.

The standardized version is then

which has an approximate standard normal distribution.

The thing is, in most of our hypothesis testing, the null hypothesis assumes that the proportions are the same (p₁ = p₂), so we can call p = p₁ = p₂.

Since p₁ = p₂, we can substitute 0 for p₁–p₂, and substitute p for both p₁ and p₂. In that case, we can rewrite the above standardized z the following way:

z formula

Which leads us to our hypothesis test for the difference between two proportions.

Performing a Hypothesis Test Regarding p₁–p₂

Step 1: State the null and alternative hypotheses.

Two-Tailed
H₀: p₁–p₂ = 0
H₁: p₁–p₂ ≠ 0

Left-Tailed
H₀: p₁–p₂ = 0
H₁: p₁–p₂ < 0

Right-Tailed
H₀: p₁–p₂ = 0
H₁: p₁–p₂ > 0

Step 2: Decide on a level of significance, α.

Step 3: Compute the test statistic, .

Step 4: Determine the P-value.

Step 5: Reject the null hypothesis if the P-value is less than the level of significance, α.

Step 6: State the conclusion.

A note about the difference between two proportions: As with the previous two sections, the order in which the proportions are placed is not important. The important thing is to note clearly in your work what the order is, and then to construct your alternative hypothesis accordingly.

Hypothesis Testing Regarding p₁–p₂ Using StatCrunch

With Data

Select Stat > Proportion Stats > Two Sample > With Data
Select the variable names. If the values are in a single column, select the column and use the Where box to identify the two samples.
Type the Successes exactly as they appear in the data, including capitalization and spacing.
Set the null and alternative hypotheses.
Click Compute.

With Summary

Select Stat > Proportion Stats > Two Sample > With Summary
Enter the number of successes* and the number of observations*.
Set the null and alternative hypotheses.
Click Compute.

* To get the counts, first create a frequency table. If you have a grouping variable, use a contingency table.

Example 1

Problem: Suppose a researcher believes that college faculty vote at a higher rate than college students. She collects data from 200 college faculty and 200 college students using simple random sampling. If 167 of the faculty and 138 of the students voted in the 2008 Presidential election, is there enough evidence at the 5% level of significance to support the researcher’s claim?

Solution:

First, we need to check the conditions. Both sample sizes are clearly less than 5% of their respective populations. In addition,

conditions

So our conditions are satisfied.

Step 1:

Let's take the two portions in the order we receive them, so
p₁ = p_f (faculty) and p₁= p_s (students)

Our hypotheses are then:
H₀: p_f- p_s = 0
H₁: p_f- p_s > 0 (since the researcher claims that faculty vote at a higher rate)

Step 2: α = 0.05 (given)

Step 3: (we'll use StatCrunch)

Step 4: Using StatCrunch:

StatCrunch calculation
(Trimmed to fit on this page.)

Step 5: Since the P-value < α, we reject the null hypothesis.

Step 6: Based on these results, there is very strong evidence (certainly enough at the 5% level of significance) to support the researcher's claim.

Confidence Intervals about the Difference Between Two Proportions

We can also find a confidence interval for the difference in two population proportions.

In general, a (1-α)100% confidence interval for p₁-p₂is

CI formula

Note: The following conditions must be true:

both sample sizes are less than 5% of their respective populations.

Confidence Intervals About p₁-p₂Using StatCrunch

With Data

Select Stat > Proportion Stats > Two Sample > With Data
Select the variable names. If the values are in a single column, select the column and use the Where box to identify the two samples.
Type the Successes exactly as they appear in the data, including capitalization and spacing.
Check the confidence interval radio button.
Set the confidence level.
Click Compute.

With Summary

Select Stat > Proportion Stats > Two Sample > With Summary
Enter the number of successes* and the number of observations*.
Check the confidence interval radio button.
Set the confidence level.
Click Compute.

* To get the counts, first create a frequency table. If you have a grouping variable, use a contingency table.

Example 2

Problem: Considering the data from Example 1, find a 99% confidence interval for the difference between the proportion of faculty and the proportion of students who voted in the 2008 Presidential election.

Solution: From Example 1, we know that the conditions for performing inference are met, so we'll use StatCrunch to find the confidence interval.

StatCrunch calculation
(Timmed to fit on this page.)

So we can say that we're 99% confident that the difference between the proportion of faculty who vote and the proportion of students who vote is between 3.7% and 25.3%.

Determining the Sample Size Needed

In Section 9.3, we learned how to find the necessary sample size if a specific margin of error is desired. We can do a similar analysis for the difference in two proportions. From the confidence interval formula, we know that the margin of error is:

margin of error

If we assume that n₁ = n₂ = n, we can solve for n and get the following result:

The sample size required to obtain a (1-α)100% confidence interval for p₁-p₂with a margin of error E is:

rounded up to the next integer, if and are esimates for p₁ and p₂, respectively.

If no prior estimate is available, use , which yields the following formula:

again rounded up to the next integer.

Note: As in Section 9.3, the desired margin of error should be expressed as a decimal.

Let's try one.

Example 3

Suppose we want to study the success rates for students in Mth098 Intermediate Algebra at ECC. We want to compare the success rates of students who place directly into Mth098 with those who first took Mth096 Beginning Algebra. From past experience, we know that a typical success rate for students in this class is about 65%. How large of a sample size is necessary to create a 95% confidence interval for the difference of the two passing rates with a maximum error of 2%?

[ reveal answer ]

calculation

So we would need a sample size of 4,370 students - from each population!