Section 12.1: Goodness-of-Fit Test
Objectives
By the end of this lesson, you will be able to...
- perform a goodness-of-fit test
For a quick overview of this section, watch this short video summary:
In Example 4 from Section 6.2, we assumed that the number of free throws made out of 10 attempts followed the binomial distribution. But does it really? And how do we know? In that problem, we assumed that the free throws were independent, but is that something we could check?
Consider a standard 6-sided die. If we assume a die is fair, each side should be equally likely. Of course, out of 100 tosses, they won't show up an equal number of times (they can't, since 1/6 of 100 is about 16.7). But how far from an equal number is acceptable?
Source: M&M's
Did you know that M&M's® Milk Chocolate Candies are supposed to come in the following percentages: 24% blue, 20% orange, 16% green, 14% yellow, 13% red, 13% brown? (Note: These values are different from those that used to be available on the M&M's website, but have been confirmed by ScientificAmeriken.) Do they really? Could a quality control engineer test that? Like the dice, how far from those expected percentages is acceptable?
These are all questions we're going to answer in this section, using something called a Goodness-of-Fit Test. Before we do that, we need a little background.
Observed vs. Expected Values
Example 1
Consider the M&M's® example above. Suppose we purchase a standard bag of Milk Chocolate M&M's, and observe the following distribution:
Blue | Orange | Green | Yellow | Red | Brown | |
Observed Frequency |
13 | 9 | 12 | 8 | 7 | 7 |
If we follow the percentages above, we would expect 24% of the M&M's® to be blue. Since we had 56 total, we would expect 24% of 56, or about 13. Similarly, we could fill out the table for the rest of the colors as follows:
Blue | Orange | Green | Yellow | Red | Brown | |
Observed Frequency |
13 | 9 | 12 | 8 | 7 | 7 |
Expected Frequency |
13 | 11 | 9 | 8 | 7 | 7 |
What we're not able to answer now is the severity of these differences - is it significant enough for us to say that the distribution is different from what the company claims?
Expected Counts
In general, the expected count for each category is the number of trials of the experiment, multiplied by the probability of that particular outcome.
Ei = n•pi
To test whether the observed values fit the stated distribution, we compare them with the expected, using the Goodness-of-Fit Test. Go to the next page to see the details and some examples.
You can also go to the video page for links to see videos in either Quicktime or iPod format. |
The Goodness-of-Fit Test
The Goodness-of-Fit Test is used to test the distribution of a single variable. In essence it compares the observed values with what we would expect. Before we can begin, we need a new test statistic.
The Test Statistic for Goodness-of-Fit Tests
If we let Oi represent the observed counts for category i, and Ei represent the expected counts, with n independent trials and k categories, then the formula
approximately follows the chi-square distribution with k-1 degrees of freedom, provided that
- all expected frequencies are greater than or equal to 1, and
- no more than 20% of the expected frequencies are less than 5.
Note: If 1 or 2 fail, we can combine categories so they are satisfied.
Performing a Goodness-of-Fit Test
Step 1: State the null and alternative hypotheses.
H0: The random variable follows the claimed distribution.
H1: The random variable does not follow the claimed distribution.
Note: The test is always a right-tailed test, since larger deviations from the expected values will result in larger Χ2 values.
Step 2: Decide on a level of significance, α.
Step 3: Compute the test statistic, .
Step 4: Determine the P-value.
Step 5: Reject the null hypothesis if the P-value is less than the level of significance, α.
Step 6: State the conclusion.
Example 2
Consider the M&M's®, from Example 1. Based on the observed counts in that bag, is there enough evidence at the 5% level of signficance to say that the distribution is different from what the company claims?
First we calculated the expected values, as we did in Example 1:
Blue | Orange | Green | Yellow | Red | Brown | |
Observed Frequency |
13 | 9 | 12 | 8 | 7 | 7 |
Expected Frequency |
13 | 11 | 9 | 8 | 7 | 7 |
Notice that all expected counts are at least 1, and none are less than 5.
Step 1:
H0: dist. of colors follows company's claim
H1: dist. of colors does not follow the company's claim
Step 2: α = 0.05 (given)
Step 3:
Step 4: P-value = P(Χ2 > 1.36, df=5) ≈ 0.9282.
Step 5: Since the P-value is much larger than α, we do not reject the null hypothesis.
Step 6: No, there is clearly not enough evidence based on this sample to say that the distribution is different from what the company claims.
Chi-Square Goodness-of-Fit Test Using StatCrunch
Assuming all cells are in equal proportion
Assuming cells follow a specific distribution
The results should appear. More information is available in the help file through StatCrunch. |
Example 3
Consider the standard 6-sided die we mentioned earlier this section. If we assume a die is fair, each side should be equally likely. Suppose we roll a die 100 times and observe the results shown below. Is there enough evidence at the 5% level of significance to say that the die is not fair?
1 | 2 | 3 | 4 | 5 | 6 | |
Observed Frequency |
22 | 12 | 17 | 13 | 10 | 26 |
First we calculated the expected values, as we did in Example 1. In each case, E = (100)*(1/6) ≈ 16.7
1 | 2 | 3 | 4 | 5 | 6 | |
Observed Frequency |
22 | 12 | 17 | 13 | 10 | 26 |
Expected Frequency |
16.7 | 16.7 | 16.7 | 16.7 | 16.7 | 16.7 |
Notice that all expected counts are at least 1, and none are less than 5.
Step 1:
H0: die is fair (each number is equally likely)
H1: die is not fair
Step 2: α = 0.05 (given)
Step 3:
Using StatCrunch, we find:
Step 4: P-value = P(Χ2 > 11.72, df=5) ≈ 0.0388.
Step 5: Since the P-value < α, we reject the null hypothesis.
Step 6: Yes, we do have enough evidence at the 5% level of significance to say that the die is not fair.