Section 7.1: Properties of the Normal Distribution
Objectives
By the end of this lesson, you will be able to...
- use the uniform probability distribution
- graph a normal curve
- state the properties of the normal curve
- explain the role of area in the normal density function
For a quick overview of this section, watch this short video summary:
Probability Density Functions
In Chapter 6, we focused on discrete random variables, random variables which take on either a finite or countable number of values. Continuous random variables, which have infinitely many values, can be a bit more complicated.
Consider the rand() function in the computer software Microsoft Excel. It returns a random number between 0 and 1. There are infinitely many possibilities, so each particular value has a probability of 0!
When we consider continuous random variables, we need to instead consider the probability "density", which might not always be the same for each value. Some ranges might be more likely, and hence the probability would be more "dense" near those values. To make this easier to understand, we need a new concept called a probability density function.
Let's look at Example 4, from Section 6.1, in which two dice were tossed and X = the sum of the two dice. The histogram below highlights P(X<6).
We can see from the histogram that P(X<6) = P(X=2) + P(X=3) + P(X=4) + P(X=5), but let's look at things a little differently. Instead of focusing on the probabilities, let's look at the area that's shaded red. The width of each rectangle is 1, so the area of each is its corresponding probability.
This leads us to another interpretation of P(X<6) - we could think of it as the area from 2 to 5. Extending that idea, we can now give a definition of a probability density function.
A probability density function is an equation used to compute probabilities of continuous random variables. The equation must satisfy the following two properties:
- The total area under the graph of the equation over all possible values of the random variable must equal 1.
- The height of the graph of the equation must be greater than or equal to 0 for all possible values of the random variable.
If we go back and consider the earlier example of the rand() function in Excel. Our probability density function would be fairly simple:
Probabilities as Areas
Now that we have the basic connection between area underneath the probability density function and the probability of that random variable, let's do a little further exploration.
In general, the area under a probability density function over a particular interval of values can have two interpretations:
- the proportion of the population with the characteristic
- the probability that a randomly selected individual will be within the interval
The Normal Curve
Many continuous variables follow a bell-shaped distribution (we introduced this shape back in Section 2.2), like an individuals height, the thickness of tree bark, IQs, or the amount of light emitted by a light bulb. The more formal name of a histogram of this shape is a normal curve.
A continuous random variable is normally distributed or has a normal probability distribution if its relative frequency histogram has the shape of a normal curve.
In Section 3.2, we introduced the Empirical Rule, which said that almost all (99.7%) of the data would be within 3 standard deviations, if the distribution is bell-shaped.
We can extend this idea to the shape of other distributions. If μ = 0 and σ = 1, almost all of the data should be between -3 and 3, with the center at 0. If μ = 0 and σ = 0.5, almost all of the data should be between -1.5 and 1.5.
Exploring the Shape of the Normal Curve
To do some exploring yourself, go to the Demonstrations Project from Wolfram Research, and download the Bell Curves demonstration. If you haven't already, download and install the player by clicking on the image to the right.
Once you have the player installed and the Bell Curves demonstration downloaded, move the sliders for the mean and standard deviations to get a sense of their effects on the shape.
So what effect did you see from moving the mean and standard deviation? You should have seen that moving the mean simply slides the shape left or right - it changes the center, not the spread. The standard deviation, on the other hand, changes the shape.
The key is area, which we mentioned earlier this section. Since the total area under the curve needs to still be equal to 1, if we make the distribution narrower by decreasing the standard deviation, it needs to get taller to equal the same area.
Fun with Plinko
Have you heard of the game, Plinko, from the game show The Price is Right? In this game, the contestant realeases a small disc on a board covered with pegs, which direct the disc left or right. Here's a video showing a particularly successful contestant.
What's interesting is that the distribution of the Plinko chips at the bottom follows a normal distribution! Here's an example of a Java applet, showing the distribution as it might develop over hundreds of Plinko chips.
Drawing Normal Curves Using StatCrunch
Click on Stat > Calculators > NormalEnter the mean and standard deviation (and x and the direction of the inequality, if desired). Then press Compute. To export the image, press Snapshot and save the image to your computer. (Don't forget where you saved it to!) Depending on your word processing software, it's usually fairly straight-forward to insert an image. In Microsoft Word, simply choose Insert > Picture and select the file you saved earlier. |
You can also go to the video page for links to see videos in either Quicktime or iPod format. |
Areas Under a Normal Curve
Let's now connect the concepts of a normal curve and the earlier idea of area under a probability density function.
Example 1
Most tests that gauge one's intelligence quotient (IQ) are designed to have a mean of 100 and a standard deviation of 15. It's also known that IQs are normally distributed. So what would the distribution look like for IQs?
There is no universal agreement on what IQ constitutes a "genius", though in 1916, psychologist Lewis M. Thurman set a guideline of 140 (scaled to 136 in today's tests) for "potential genius".
Suppose the area to the right of 136 is about 0.0082. What are two interpretations of that area?
- About 0.82% of all individuals can be classified as a "potential genius" according to Dr. Thurman.
- If an individual is selected at random, there is a probability of about 0.0082 that the individual is a "potential genius".
Example 2
Source: stock.xchng
Weights of 1-year-old boys are approximately normally distributed, with a mean of 22.8 lbs and a standard deviation of about 2.15. (Source: About.com)
- Draw a quick sketch of the normal curve for the weights of 1-year-old boys.
- Shade the area representing the boys who are at least 20 pounds.
- The area is approximately 0.9036. Give two interpretations of this result.
- Two interpretations would be (1) approximately 90% of all 1-year-old boys weigh at least 20 pounds; and (2) the probability that a randomly selected 1-year-old boy weighs at least 20 pounds is about 0.9036.
The Standard Normal Distribution
Back in Section 3.4, we introduced the idea of a z-score:
The z-score represents the number of standard deviations a data value is from the mean.
Z = | x - μ |
σ |
We mentioned then that we'd need to remember the z-score later - this is that moment!
The z-score is important, because if the variable X is normally distributed, Z is as well. This brings us to an important fact:
If X is normally distributed with mean μ and standard deviation σ, then
Z = | x - μ |
σ |
is normally distributed with a mean of 0 and a standard deviation of 1. We say that Z has the standard normal distribution.
Exploring the Standard Normal Distribution
To do some exploring yourself, go to the Demonstrations Project from Wolfram Research, and download the Area of a Normal Distribution demonstration. If you haven't already, download and install the player by clicking on the image to the right.
Once you have the player installed and the Area of a Normal Distribution demonstration downloaded, move the sliders for the mean and standard deviation of X and the value of Z to see the relationship between areas under the general normal curve and the areas under the standard normal curve.
The idea here is that the area under the normal curve on the right is equal to the area under the standard normal curve on the left.