Section 11.1: Inference about Two Proportions
Objectives
By the end of this lesson, you will be able to...
- test hypotheses regarding two population proportions
- construct and interpret confidence intervals for the difference between two population proportions
- determine the sample size necessary for estimating the difference between two population proportions within a specific margin of error
For a quick overview of this section, watch this short video summary:
In Chapters 9 and 10, we studied inferential statistics (confidence intervals and hypothesis tests) regarding population parameters of a single population - the average rest heart rate of students in a class, the proportion of ECC who voted, etc.
In Chapter 11, we'll be considering the relationship between two populations - means, proportions and standard deviaions.
A frequent comparison we want to make between to populations is concerning the proportion of individuals with certain characteristics. For example, suppose we want to determine if college faculty voted at a higher rate than ECC students in the 2008 presidential election. Since we don't have any information from either population, we would need to take samples from each. This isn't an example of a hypothesis test from Section 10.4, about one proportion, it'd be comparing two proportions, so we need some new background.
The information that follows is a bit heavy, but it shows the theoretical background for testing claims and finding confidence intervals for the difference between two population proportions.
The Difference Between Two Population Proportions
In Section 8.2, we discussed the distribution of one sample proportion, . What we'll need to do now is develop some similar theory regarding the distribution of the difference in two sample proportions, .
The Sampling Distribution of the Difference between Two Proportions
Suppose simple random samples size n1 and n2 are taken from two populations. The distribution of where and , is aproximately normal with mean and standard deviation
, provided:
- both sample sizes are less than 5% of their respective populations.
The standardized version is then
which has an approximate standard normal distribution.
The thing is, in most of our hypothesis testing, the null hypothesis assumes that the proportions are the same (p1 = p2), so we can call p = p1 = p2.
Since p1 = p2, we can substitute 0 for p1–p2, and substitute p for both p1 and p2. In that case, we can rewrite the above standardized z the following way:
Which leads us to our hypothesis test for the difference between two proportions.
Performing a Hypothesis Test Regarding p1–p2
Step 1: State the null and alternative hypotheses.
Two-Tailed H0: p1–p2 = 0 H1: p1–p2 ≠ 0 |
Left-Tailed H0: p1–p2 = 0 H1: p1–p2 < 0 |
Right-Tailed H0: p1–p2 = 0 H1: p1–p2 > 0 |
Step 2: Decide on a level of significance, α.
Step 3: Compute the test statistic, .
Step 4: Determine the P-value.
Step 5: Reject the null hypothesis if the P-value is less than the level of significance, α.
Step 6: State the conclusion.
A note about the difference between two proportions: As with the previous two sections, the order in which the proportions are placed is not important. The important thing is to note clearly in your work what the order is, and then to construct your alternative hypothesis accordingly.
Hypothesis Testing Regarding p1–p2 Using StatCrunch
With Data
With Summary
* To get the counts, first create a frequency table. If you have a grouping variable, use a contingency table. |
Example 1
Problem: Suppose a researcher believes that college faculty vote at a higher rate than college students. She collects data from 200 college faculty and 200 college students using simple random sampling. If 167 of the faculty and 138 of the students voted in the 2008 Presidential election, is there enough evidence at the 5% level of significance to support the researcher’s claim?
Solution:
First, we need to check the conditions. Both sample sizes are clearly less than 5% of their respective populations. In addition,
So our conditions are satisfied.
Step 1:
Let's take the two portions in the order we receive them, so
p1 = pf (faculty) and p1= ps (students)
Our hypotheses are then:
H0: pf - ps = 0
H1: pf - ps > 0 (since the
researcher claims that faculty vote at a higher rate)
Step 2: α = 0.05 (given)
Step 3: (we'll use StatCrunch)
Step 4: Using StatCrunch:
(Trimmed to fit on this page.)
Step 5: Since the P-value < α, we reject the null hypothesis.
Step 6: Based on these results, there is very strong evidence (certainly enough at the 5% level of significance) to support the researcher's claim.
Confidence Intervals about the Difference Between Two Proportions
We can also find a confidence interval for the difference in two population proportions.
In general, a (1-α)100% confidence interval for p1-p2is
Note: The following conditions must be true:
- both sample sizes are less than 5% of their respective populations.
Confidence Intervals About p1-p2 Using StatCrunch
With Data
With Summary
* To get the counts, first create a frequency table. If you have a grouping variable, use a contingency table. |
Example 2
Problem: Considering the data from Example 1, find a 99% confidence interval for the difference between the proportion of faculty and the proportion of students who voted in the 2008 Presidential election.
Solution: From Example 1, we know that the conditions for performing inference are met, so we'll use StatCrunch to find the confidence interval.
(Timmed to fit on this page.)
So we can say that we're 99% confident that the difference between the proportion of faculty who vote and the proportion of students who vote is between 3.7% and 25.3%.
Determining the Sample Size Needed
In Section 9.3, we learned how to find the necessary sample size if a specific margin of error is desired. We can do a similar analysis for the difference in two proportions. From the confidence interval formula, we know that the margin of error is:
If we assume that n1 = n2 = n, we can solve for n and get the following result:
The sample size required to obtain a (1-α)100% confidence interval for p1-p2with a margin of error E is:
rounded up to the next integer, if and are esimates for p1 and p2, respectively.
If no prior estimate is available, use , which yields the following formula:
again rounded up to the next integer.
Note: As in Section 9.3, the desired margin of error should be expressed as a decimal.
Let's try one.
Example 3
Suppose we want to study the success rates for students in Mth098 Intermediate Algebra at ECC. We want to compare the success rates of students who place directly into Mth098 with those who first took Mth096 Beginning Algebra. From past experience, we know that a typical success rate for students in this class is about 65%. How large of a sample size is necessary to create a 95% confidence interval for the difference of the two passing rates with a maximum error of 2%?
So we would need a sample size of 4,370 students - from each population!