Section 8.1: Distribution of the Sample Mean
Objectives
By the end of this lesson, you will be able to...
- describe the distribution of the sample mean for samples obtained from normal populations
- describe the distribution of the sample mean for samples obtained from a population that is not normal
For a quick overview of this section, watch this short video summary:
Sampling Distributions
Consider the following three news items.
The average price of unleaded regular fell by 1.6 cents to $3.667 a gallon on Saturday, from $3.683 a gallon, according to survey results from the motorist group AAA. (Source: CNNMoney.com)
The Census Bureau on Tuesday released the 2007 American Community Survey, the government's annual estimates of social, economic and housing characteristics for the nation. Among the highlights: 25.3 - In minutes, the average commute to work in 2007, an increase from 25.0 minutes in 2006. (Source: Chicago Tribune)
Barack Obama leads John McCain, 49% to 44%, when registered voters are asked who they would vote for if the election were held today, according to the latest Gallup Poll Daily tracking update. (Source: Gallup)
All three of these are estimates based on samples In fact, they're probably not correct, due to sampling error. From Section 1.4,
Sampling error is the error that results from using a sample to estimate information regarding a population.
The idea is this - unless we sample every single individual in the sample, there will be some error in our results. Our goal in this section will be to characterize the distribution of the sample mean.
The Distribution of the Sample Mean
Let's look again at the definition of a random variable, from Section 6.1.
A random variable is a numerical measure of the outcome of a probability experiment whose value is determined by chance.
Think about the sample mean, . Isn't it's value determined by chance as well? Since we the individuals in a sample are randomly selected, the sample mean will depend on those individuals selected, so it, too, is a random variable. The big question, then, is the distribution of - in other words, what are its mean (the mean of the sample mean, ) and its standard deviation (the standard deviation of the sample mean, )?
To investigate these, let's look at a particular population.
Consider the heights of the players from the starting line-up from the 2008 Men's Olympic Basketball gold medal game - Jason Kidd (76"), LeBron James (80"), Kobe Bryant (78"), Carmelo Anthony (78"), and Dwight Howard (83"). (Source: NBC Sports) The mean of the population is 79", with a standard deviation of 2.37"
First, let's consider the different samples of size 2. There are 10 such samples (5C2 = 10), shown below, along with their corresponding sample means.
sample | heights | |
Kidd, James | 76, 80 | 78 |
Kidd, Bryant | 76, 78 | 77 |
Kidd, Anthony | 76, 78 | 77 |
Kidd, Howard | 76, 83 | 79.5 |
James, Bryant | 80, 78 | 79 |
James, Anthony | 80, 78 | 79 |
James, Howard | 80, 83 | 81.5 |
Bryant, Anthony | 78, 78 | 78 |
Bryant, Howard | 78, 83 | 80.5 |
Anthony, Howard | 78, 83 | 80.5 |
Interestingly, the mean of the sample means of size 2 is 79" as well. This is actually reasonable, though, because we know that the mean of a random variable is also its expected value, and it makes perfect sense that the value we should expect from the sample mean is the same as the population mean!
The standard deviation, though, is very different. It helps to look at things visually. The image below represents all possible sample means for samples of size 1 (individuals), 2, 3, 4, and 5 (the population). Pay particular attention to the standard deviation.
The interesting things to note here are that = 79, regardless of the sample size, but the standard deviation decreases as n increases. If we think about this a bit, this too, is reasonable. The more individuals we have in our sample, the more likely we are to be closer to the true mean. Things brings us to our first major point.
The Law of Large Numbers
As n increases, the difference between and μ approaches zero.
We're now ready to investigate the standard deviation of a bit more in-depth.
The Central Limit Theorem
The Central Limit Theorem
Regardless of the distribution shape of the population, the sampling distribution of becomes approximately normal as the sample size n increases (conservatively n≥30).
This is very interesting! So it doesn't matter if the distribution shape was left-skewed, right-skewed, uniform, binomial, anything - the distribution of the sample mean will always become normal as the sample size increases. What an amazing result!
Exploring the Distribution of the Sample Mean
To do some exploring yourself, go to the Demonstrations Project from Wolfram Research, and download the Central Limit Theorem demonstration. If you haven't already, download and install the player by clicking on the image to the right.
Once you have the player installed and the Central Limit Theorem demonstration downloaded, move the slider for the sample size to get a sense of its affect on the distribution shape. You can also move the new sample slider to get a different sample.
The Distribution of the Sample Mean
We can even be more specific about the distribution of :
The Sampling Distribution of
If a simple random sample of size n is drawn from a large population with mean μ and standard deviation σ, the sampling distribution of will have mean and standard deviation:
and |
where is the standard error of the mean.
Key fact: If the population is normally distributed, then the sample mean will be normally distributed, regardless of the sample size.
Now let's apply this distribution to various problems.
Using the Central Limit Theorem
In order to find probabilities about a normal random variable, we need to first know its mean and standard deviation. With the results of the Central Limit Theorem, we now know the distribution of the sample mean, so let's try using that in some examples.
Let's see a couple examples.
Example 1
Let's consider again the distribution of IQs that we looked at in Example 1 in Section 7.1.
We saw in that example that tests for an individual's intelligence quotient (IQ) are designed to be normally distributed, with a mean of 100 and a standard deviation of 15.
What is the probability that a randomly selected sample of 20 individuals would have a mean IQ of more than 105?
Solution:
To answer this question, we need to find P( > 105), if n = 20. Before we can do that, we need to first find the distribution of . From the distribution of the sample mean, we know and .
Here's what the distribution of looks like in relation to the distribution of X.
Now that we have the standard deviation, we can find the probability. Using StatCrunch...
Example 2
Source: stock.xchng
In Example 2 in Section 7.1, we were told that weights of 1-year-old boys are approximately normally distributed, with a mean of 22.8 lbs and a standard deviation of about 2.15. (Source: About.com)
Suppose the sample mean of the 10 1-year-old boys at the Kiddie Care day care center is 22.3 lbs. Is that unusual?
Solution:
In order to determine if an event is unusual, we need to find its probability. If the probability of the event is less than 5%, we can classify it as an unusual event.
In this case, we want to find the probability of observing a sample mean of 22.3 or less. Using the distribution of the sample mean, and . Using StatCrunch...
So we'd observe a sample mean of 22.3 lbs or less from a sample of 10 1-year-old boys about 23% of the time, which is not very unusual at all.
Here's one for you to try:
Example 3
Suppose that a particular professor's Statistics exam on probability traditionally has a mean score of 74, with a standard deviation of 11.
The professor suspects that his current crop of students is very strong. To compare, he gives them the same exam he has in the past. The sample mean of the 28 students in his current class was a 78.
Was the professor correct? Is his current class of students unusual compared to those from the past?
Like the previous example, we need to find the probability of our event in order to determine if it was unusual.
Using the distribution of the sample mean, we know that and . Using StatCrunch...
Since the probability of observing a class of 28 students with an average score of 78 is less than 0.05, it does look like this particular class is unusual. (But in a good way!)