Section 12.1: Goodness-of-Fit Test

Objectives

By the end of this lesson, you will be able to...

perform a goodness-of-fit test

For a quick overview of this section, watch this short video summary:

In Example 4 from Section 6.2, we assumed that the number of free throws made out of 10 attempts followed the binomial distribution. But does it really? And how do we know? In that problem, we assumed that the free throws were independent, but is that something we could check?

Consider a standard 6-sided die. If we assume a die is fair, each side should be equally likely. Of course, out of 100 tosses, they won't show up an equal number of times (they can't, since 1/6 of 100 is about 16.7). But how far from an equal number is acceptable?

Source: M&M's

Did you know that M&M's® Milk Chocolate Candies are supposed to come in the following percentages: 24% blue, 20% orange, 16% green, 14% yellow, 13% red, 13% brown? (Note: These values are different from those that used to be available on the M&M's website, but have been confirmed by ScientificAmeriken.) Do they really? Could a quality control engineer test that? Like the dice, how far from those expected percentages is acceptable?

These are all questions we're going to answer in this section, using something called a Goodness-of-Fit Test. Before we do that, we need a little background.

Observed vs. Expected Values

Example 1

Consider the M&M's® example above. Suppose we purchase a standard bag of Milk Chocolate M&M's, and observe the following distribution:

	Blue	Orange	Green	Yellow	Red	Brown
Observed Frequency	13	9	12	8	7	7

If we follow the percentages above, we would expect 24% of the M&M's® to be blue. Since we had 56 total, we would expect 24% of 56, or about 13. Similarly, we could fill out the table for the rest of the colors as follows:

	Blue	Orange	Green	Yellow	Red	Brown
Observed Frequency	13	9	12	8	7	7
Expected Frequency	13	11	9	8	7	7

What we're not able to answer now is the severity of these differences - is it significant enough for us to say that the distribution is different from what the company claims?

Expected Counts

In general, the expected count for each category is the number of trials of the experiment, multiplied by the probability of that particular outcome.

E_i = n•p_i

To test whether the observed values fit the stated distribution, we compare them with the expected, using the Goodness-of-Fit Test. Go to the next page to see the details and some examples.

You can also go to the video page for links to see videos in either Quicktime or iPod format.

The Goodness-of-Fit Test

The Goodness-of-Fit Test is used to test the distribution of a single variable. In essence it compares the observed values with what we would expect. Before we can begin, we need a new test statistic.

The Test Statistic for Goodness-of-Fit Tests

If we let O_i represent the observed counts for category i, and E_i represent the expected counts, with n independent trials and k categories, then the formula

approximately follows the chi-square distribution with k-1 degrees of freedom, provided that

all expected frequencies are greater than or equal to 1, and
no more than 20% of the expected frequencies are less than 5.

Note: If 1 or 2 fail, we can combine categories so they are satisfied.

Performing a Goodness-of-Fit Test

Step 1: State the null and alternative hypotheses.

H₀: The random variable follows the claimed distribution.
H₁: The random variable does not follow the claimed distribution.

Note: The test is always a right-tailed test, since larger deviations from the expected values will result in larger Χ² values.

Step 2: Decide on a level of significance, α.

Step 3: Compute the test statistic, .

Step 4: Determine the P-value.

Step 5: Reject the null hypothesis if the P-value is less than the level of significance, α.

Step 6: State the conclusion.

Example 2

Consider the M&M's®, from Example 1. Based on the observed counts in that bag, is there enough evidence at the 5% level of signficance to say that the distribution is different from what the company claims?

[ reveal answer ]

First we calculated the expected values, as we did in Example 1:

	Blue	Orange	Green	Yellow	Red	Brown
Observed Frequency	13	9	12	8	7	7
Expected Frequency	13	11	9	8	7	7

Notice that all expected counts are at least 1, and none are less than 5.

Step 1:
H₀: dist. of colors follows company's claim
H₁: dist. of colors does not follow the company's claim

Step 2: α = 0.05 (given)

Step 3:
test statistic

Step 4: P-value = P(Χ² > 1.36, df=5) ≈ 0.9282.

Step 5: Since the P-value is much larger than α, we do not reject the null hypothesis.

Step 6: No, there is clearly not enough evidence based on this sample to say that the distribution is different from what the company claims.

Chi-Square Goodness-of-Fit Test Using StatCrunch

Assuming all cells are in equal proportion

Select Stat > Goodness-of-fit > Chi-Square test
Select the column for the Observed counts.
Select All cells in equal proportion.
Click Compute.

Assuming cells follow a specific distribution

Compute the expected counts following the desired distribution and enter those values in their own column.
Select Stat > Goodness-of-fit > Chi-Square test
Select the column for the Observed counts.
Select the column for the Expected counts you previously computed.
Click Compute.

The results should appear.

More information is available in the help file through StatCrunch.

Example 3

Consider the standard 6-sided die we mentioned earlier this section. If we assume a die is fair, each side should be equally likely. Suppose we roll a die 100 times and observe the results shown below. Is there enough evidence at the 5% level of significance to say that the die is not fair?

	1	2	3	4	5	6
Observed Frequency	22	12	17	13	10	26

[ reveal answer ]

First we calculated the expected values, as we did in Example 1. In each case, E = (100)*(1/6) ≈ 16.7

	1	2	3	4	5	6
Observed Frequency	22	12	17	13	10	26
Expected Frequency	16.7	16.7	16.7	16.7	16.7	16.7

Notice that all expected counts are at least 1, and none are less than 5.

Step 1:
H₀: die is fair (each number is equally likely)
H₁: die is not fair

Step 2: α = 0.05 (given)

Step 3:
Using StatCrunch, we find:

StatCrunch output

Step 4: P-value = P(Χ² > 11.72, df=5) ≈ 0.0388.

Step 5: Since the P-value < α, we reject the null hypothesis.

Step 6: Yes, we do have enough evidence at the 5% level of significance to say that the die is not fair.