Section 2.2: Organizing Quantitative Data: The Popular Displays
Objectives
By the end of this section, you will be able to...
 construct bar graph
 construct pie charts
 construct histograms for discrete and continuous data
 draw stemandleaf plots
 draw dot plots
 identify the shape of a distribution
 construct frequency polygons*
 construct ogives*
 draw timeseries graphs
* You will not be tested on these objectives.
For a quick overview of this section, watch this short video summary:
Bar Graphs
Bar graphs are probably the most commonly used graphs, and one you're already familiar with. I won't mention much more here, except to state a couple keys:
 heights can be frequency or relative frequency
 bars must not touch
Using our the data from our previous color example,
favorite color  frequency  relative frequency 
blue  10  10/26 ≈ 0.38 
red  3  3/26 ≈ 0.12 
orange  1  1/26 ≈ 0.04 
yellow  3  3/26 ≈ 0.12 
green  5  5/26 ≈ 0.19 
pink  3  3/26 ≈ 0.12 
purple  1  1/26 ≈ 0.04 
we could then make both frequency and relative frequency bar graphs.
Technology
Here's a quick overview of how to create bar graphs in StatCrunch.

Pareto Charts
A Pareto chart is a bar graph whose bars are drawn in decreasing order of frequency or relative frequency.
You see Pareto charts fairly often in the newspaper, because often the article is trying to show that one particular category is the highest or lowest. The image below, for example, is from the Chicago Tribune. You can see clearly from the graph that it's attempting to show that the local BP refinery in Whiting, Indiana is the highestcapacity refinery that is considering expansion.
If you don't remember the issue, you can read up about BP's plan to expand it's refinery in this article from CBS2 Chicago.
Here's another one, using the favorite color data from the last section:
SidebySide Bar Graphs
Sidebyside bar graphs are used when you want to compare two different populations. The key with sidebyside bar graphs is that you must use relative frequencies. Do you know why?
I think so. But just in case...
Look at it this way: Let's suppose we want to compare the poverty levels
for different cities in Illinois. If we used frequencies only, Cook county
dominates  almost 800,000, where no other county has over 50,000. On
the other hand, if we looked at relative frequency, Cook county still
has the most (15%), but other counties such as Kane are close, with rates
around 8%.
Source: 2007
Illinois Poverty Summit
Here's a good example of a sidebyside chart, from the Associated Press.
What's shown isn't quite a relative frequency as we've defined it  it's the number per 100,000, where ours as a percent is the number per 100. The reason why the rate per 100,000 is used here is because the percents would all be less than 1% and difficult to read. Still, if frequency was used instead, the "White" category would be the largest, simply because that's the largest segment of the U.S. population.
Technology
Here's a quick overview of how to create sidebyside bar graphs in StatCrunch.

Pie Charts
Like bar graphs, pie charts are very common. You're probably already aware of these as well. I'll just include a couple comments:
 should always include the relative frequency
 also should include labels, either directly or as a legend
Using our the data from our previous color example,
favorite color  frequency  relative frequency 
blue  10  10/26 ≈ 0.38 
red  3  3/26 ≈ 0.12 
orange  1  1/26 ≈ 0.04 
yellow  3  3/26 ≈ 0.12 
green  5  5/26 ≈ 0.19 
pink  3  3/26 ≈ 0.12 
purple  1  1/26 ≈ 0.04 
we get this pie chart:.
Technology
Here's a quick overview of how to create pie charts in StatCrunch.
It's worth noting that pie charts should only be used for nominal data. 
Histograms
Histograms are so important that they got their own video!
Singlevalued Histograms
To display quantitative data, we need a new type of chart, called a histogram. Histograms look similar to bar graphs, but they have some distinct differences  and for good reason.
A histogram is constructed by drawing rectangles for each class of data. The height of each rectangle is the frequency or relative frequency of the class. The width of each rectangle is the same and the rectangles touch each other.
The rectangles need to touch in a histogram because we want to imply that the classes are adjacent. In a bar graph, a favorite color of "blue" isn't really adjacent to "red", even though we might put it that way in a bar graph. For quantitative data like the data used in Example 1 earlier this section, the value 2 really is next to the value 3.
Let's take a closer look at that example.
Example 4
children  frequency  relative frequency 
1  3  3/26 ≈ 0.12 
2  8  8/26 ≈ 0.31 
3  10  10/26 ≈ 0.38 
4  2  2/26 ≈ 0.08 
5  3  3/26 ≈ 0.12 
To make a histogram, we make what looks like a bar graph with a couple key differences:
 rectangles must touch
 class labels are underneath the rectangle
Here's what they'd look like for our example data:
Technology
Here's a quick overview of how to create histograms for singlevalued discrete data using StatCrunch.

Histograms for Multivalued and Continuous Data
Multivalued and continuous histograms are probably where the most errors occur. There are some key differences between this and singlevalued histograms. In this case, each rectangle doesn't represent a single value, but rather a range of values. Because of that, we don't label the class on the horizontal axis. Instead, we label the lower class limits at the left edge of each rectangle.
Let's demonstrate using an example:
Example 5
average commute  frequency  relative frequency 
1617.9  1  1/15 ≈ 0.07 
1819.9  2  2/15 ≈ 0.13 
2021.9  1  1/15 ≈ 0.07 
2223.9  6  6/15 = 0.40 
2425.9  2  2/15 ≈ 0.13 
2627.9  1  1/15 ≈ 0.07 
2829.9  1  1/15 ≈ 0.07 
3031.9  1  1/15 ≈ 0.07 
Here's what a frequency histogram would look like for these data:
Technology
Here's a quick overview of how to create histograms for multivalued discrete data or continuous data in StatCrunch.

One final note about histograms: Because they show us such nice information about the distribution of a set of data, we'll be using them frequently throughout the rest of the semester. Be sure you spend plenty of time familiarizing yourself with the technology, so you're able to create histograms with ease.
StemandLeaf Plots
Stemandleaf plots are another way to represent quantitative data. They give more detail because they show the actual data. The idea is to split each data value into two parts  a stem and a leaf. The stem is everything of the rightmost digit, and the leaf is that rightmost digit. Here's an example, using the data from earlier this section regarding exam scores from a previous Mth120 class.
Example 6
62 
87  67  58  95  94  91  69  52 
76  82  85  91  60  77  72  83  79 
63  88  79  88  70  75  87 
With these data, the stems are the first digits  5, 6, 7, 8, and 9. The leafs are all the second digits, 0, 1, ... , 9. The full stemandleaf plot lists the stems down the left side, a vertical bar between, and then lists the leafs in order to the right. Something like this:
It's interesting that this plot looks very similar to a histogram, only it gives us the actual data. Take a look at this animation to see the relationship:
There are some limitations to stemandleaf plots. In particular, we're limited to small data sets  can you imagine the leaves if we had 1,000 test scores? Also, the range in the data needs to be fairly small.
By that, I mean if the data values range from 1100, our stems can be 0, 10, 20, ... , 90, as they were in this example. On the other hand, if the values range from 110,000, the stems would have to be 0, 10, 20, ... , 9,980, 9,990. That's a lot of rows!
Technology
Here's a quick overview of how to create stemandleaf plots in StatCrunch.

Dot Plots
Dot pots are similar to singlevalued histograms, but rather than placing rectangles above each particular value, a dot plot just places the required number of dots above each value. Looking at our example again with the number of children, the plot would look something like this:
Technology
Here's a quick overview of how to create dot plots in StatCrunch.

Distribution Shape
A good way to describe a distribution is its shape. In general, we describe a distribution's shape in one of four ways (though there are others):
 uniform  frequencies are evenly spread out among all values of the variable
 symmetric (bellshaped)  highest value is in the middle, with values tailing off to the right and left
 left (negative) skewed  highest value is on the right, with a longer left "tail"
 right (positive) skewed  highest values is on the left, with a longer right "tail"
uniform 
symmetric (bellshaped) 
left (negative) skewed 
right (positive) skewed 
In addition to histograms, stemandleaf plots, and dot plots, there are some other, section common plots. We'll introduce a couple in this section. The first type, frequency polygons, are not a type of plot that will be expected of you on exams, though you will be asked questions about them on homework.
Frequency Polygons
A frequency polygon is drawn by plotting a point above each class midpoint and connecting the points with a straight line. (Class midpoints are found by average successive lower class limits.)
Example 1
To illustrate the idea, let's look at the average commute data from the last section.
average commute  midpoint  frequency  relative frequency 
1617.9  17  1  1/15 ≈ 0.07 
1819.9  19  2  2/15 ≈ 0.13 
2021.9  21  1  1/15 ≈ 0.07 
2223.9  23  6  6/15 = 0.40 
2425.9  25  2  2/15 ≈ 0.13 
2627.9  27  1  1/15 ≈ 0.07 
2829.9  29  1  1/15 ≈ 0.07 
3031.9  31  1  1/15 ≈ 0.07 
The three images below show the relationship between the histogram and the frequency polygon.
Note: No technology section this time, since you won't be asked to do this for exams.
Ogives
Ogives are pretty funky graphs, and rarely used except in specific areas. We'll just give a quick example here, but like frequency polygons, you won't be expected to create these on an exam. (Though it may come up in homework.)
An ogive (read as "oh jive") is a graph that represents the cumulative frequency or cumulative relative frequency for the class. It is constructed by plotting points  the xcoordinates are the upper class limits and the ycoordinate is the corresponding cumulative frequency or cumulative relative frequency.
Example 3
To illustrate the idea, let's again use the average commute data from the last section.
average commute  relative frequency 
cumulative relative frequency 
1617.9  1/15 ≈ 0.07  1/15 ≈ 0.07 
1819.9  2/15 ≈ 0.13  3/15 ≈ 0.20 
2021.9  1/15 ≈ 0.07  4/15 ≈ 0.27 
2223.9  6/15 = 0.40  10/15 ≈ 0.67 
2425.9  2/15 ≈ 0.13  12/15 = 0.80 
2627.9  1/15 ≈ 0.07  13/15 ≈ 0.87 
2829.9  1/15 ≈ 0.07  14/15 ≈ 0.93 
3031.9  1/15 ≈ 0.07  15/15 = 1.00 
Note: No technology section this time, since you won't be asked to do this for exams.
TimeSeries Graphs
Time series graphs are much more common than the last couple times we've looked at. It's common to see stock prices and daily temperature graphs in the news  both are time series plots.
A time series plot is obtained by plotting the time in which a variable is measured on the horizontal axis and the corresponding value of the variable on the vertical axis.
The example above is from the Chicago Tribune and reflects the price of uranium from 20012006.
Example 4
Here's another example, using the daily high temperature in Elgin, IL, for the month of June, 2008.
date  daily high temperature 
6/1  80 
6/2  86 
6/3  72 
6/4  81 
6/5  89 
6/6  89 
6/7  86 
6/8  85 
6/9  73 
6/10  80 
6/11  84 
6/12  91 
6/13  82 
6/14  84 
6/15  81 
6/16  72 
6/17  77 
6/18  78 
6/19  81 
6/20  85 
6/21  82 
6/22  81 
6/23  78 
6/24  81 
6/25  80 
6/26  85 
6/27  82 
6/28  83 
6/29  75 
6/30  81 
And the time series plot would look something like this:
Technology
Here's a quick overview of how to create a time series plot in StatCrunch.
