# Section 2.2: Organizing Quantitative Data: The Popular Displays

## Objectives

By the end of this section, you will be able to...

1. organize quantitative data into tables
2. construct histograms for discrete and continuous data
3. draw stem-and-leaf plots
4. draw dot plots
5. identify the shape of a distribution

For a quick overview of this section, feel free to watch this short video summary:

Like qualitative data in the last section, quantitative data can (and should) be organized into tables. We'll break this page up into two parts - discrete and continuous.

## Organizing Discrete Data into Tables

If you recall from Section 1.2,

A discrete variable is a quantitative variable that has either a finite number of possible values or a countable number of values. (Countable means that the values result from counting - 0, 1, 2, 3, ...)

Since we can list all the possible values (that's essentially what countable means), one way to make a table is just to list the values along with their corresponding frequency.

Example 1

Here's some data I collected from a previous students Mth120 course. It refers to the number of children in their family (including themselves).

 2 2 2 4 5 3 3 3 3 2 1 2 3 5 3 4 3 1 2 3 5 3 2 1 3 2

An easy way to compile the data would then be to make a frequency or relative frequency table as we did before.

 children frequency relative frequency 1 3 3/26 ≈ 0.12 2 8 8/26 ≈ 0.31 3 10 10/26 ≈ 0.38 4 2 2/26 ≈ 0.08 5 3 3/26 ≈ 0.12

Sometimes, however, we have too many values to make a row for each one. In that case, we'll need to group several values together.

Example 2

A good example might be the scores on an exam, ranging from 1-100. Here are some data from a past Mth120 class.

 62 87 67 58 95 94 91 69 52 76 82 85 91 60 77 72 83 79 63 88 79 88 70 75 87

In this case, we'll have to set up intervals of numbers called classes. Each class has a lower class limit and an upper class limit, along with a class width. The class width is the difference between successive lower class limits.

To be consistent, the class width should be same for each class. One good option might look something like this: ## Organizing Continuous Data into Tables

Organizing continuous data is similar to organizing multi-valued discrete data. We have to form classes which don't overlap. I usually try to design a class width that's either logical (i.e. 10 points for grades above) or so that I have 5-8 classes when complete.

Example 3

For this example, let's consider the average commute for each of the 50 states. The data below show the average daily commute of a random sample of 15 states.

 23.1 18.3 23.2 19.9 26.6 24.8 23.1 23.2 22.7 29.4 22.3 30.0 25.8 21.9 16.7 Source: US Census

Do you know why this is a continuous random variable and not discrete? (Hint: It's not because of the decimal.)

I think I know!

This is continuous because the variable we're measuring - time - is not finite. When, say, a marketing agent measures her commute time, she actually rounds to the nearest minute. If she reports 32 minutes, it's not exactly 32 minutes, it's 32 minute to the nearest minute. In reality, it might be 32.15323623245134... (you get the idea).

To make a frequency or relative frequency for continuous data, we use the same strategy we'd use for multi-valued discrete data.

 average commute frequency relative frequency 16-17.9 1 1/15 ≈ 0.07 18-19.9 2 2/15 ≈ 0.13 20-21.9 1 1/15 ≈ 0.07 22-23.9 6 6/15 = 0.40 24-25.9 2 2/15 ≈ 0.13 26-27.9 1 1/15 ≈ 0.07 28-29.9 1 1/15 ≈ 0.07 30-31.9 1 1/15 ≈ 0.07

Once we have these tables, we'll need to learn how to create some charts to display the information, which is what the next few page are about.

## Technology

Here's a quick overview of how to create frequency and relative frequency tables for quantitative data in StatCrunch.

 Discrete Data Enter or import the data. Select Stat > Tables > Frequency. Select the column(s) you want to summarize and click Next. Add any modifications for an "Other" category and how to order the categories, and click Calculate. Continuous or Multi-valued Discrete Data: Enter or import the data. Select Data > Bin Column. Select the column containing the data, select "Use fixed width bins", and set the lowest class limit (Start bins at:) and class (bin) width. Click Calculate. Select Stat > Tables > Frequency. Select the newly created bin column and click Calculate.* * Note that these classes seem to overlap, but that the class "0-k" does not include Mk. Creating a relative frequency table from a frequency table If you are given a frequency table and need to create a relative frequency table, use the following steps, assuming that "Frequency" is the label of the column containing the frequencies - edit as needed. Click on Data > Compute > Expression. Enter the text "Frequency/sum(Frequency)" in the Expression box. If desired, enter a column label. Click Compute.

## Single-valued Histograms

To display quantitative data, we need a new type of chart, called a histogram. Histograms look similar to bar graphs, but they have some distinct differences - and for good reason.

A histogram is constructed by drawing rectangles for each class of data. The height of each rectangle is the frequency or relative frequency of the class. The width of each rectangle is the same and the rectangles touch each other.

The rectangles need to touch in a histogram because we want to imply that the classes are adjacent. In a bar graph, a favorite color of "blue" isn't really adjacent to "red", even though we might put it that way in a bar graph. For quantitative data like the data used in Example 1 earlier this section, the value 2 really is next to the value 3.

Let's take a closer look at that example.

Example 4

 children frequency relative frequency 1 3 3/26 ≈ 0.12 2 8 8/26 ≈ 0.31 3 10 10/26 ≈ 0.38 4 2 2/26 ≈ 0.08 5 3 3/26 ≈ 0.12

To make a histogram, we make what looks like a bar graph with a couple key differences:

1. rectangles must touch
2. class labels are underneath the rectangle

Here's what they'd look like for our example data: ### Technology

Here's a quick overview of how to create histograms for single-valued discrete data using StatCrunch.

1. Enter or import the data.
2. Select Graphics > Histogram
3. Select the column(s) you want to summarize and click Next.
4. Set the Type, lower class limit (Start bins at:).
5. Set the class width (bin) to 1, and click Calculate. You can also go to the video page for links to see videos in either Quicktime or iPod format.

## Histograms for Multi-valued and Continuous Data

Multi-valued and continuous histograms are probably where the most errors occur. There are some key differences between this and single-valued histograms. In this case, each rectangle doesn't represent a single value, but rather a range of values. Because of that, we don't label the class on the horizontal axis. Instead, we label the lower class limits at the left edge of each rectangle.

Let's demonstrate using an example:

Example 5

 average commute frequency relative frequency 16-17.9 1 1/15 ≈ 0.07 18-19.9 2 2/15 ≈ 0.13 20-21.9 1 1/15 ≈ 0.07 22-23.9 6 6/15 = 0.40 24-25.9 2 2/15 ≈ 0.13 26-27.9 1 1/15 ≈ 0.07 28-29.9 1 1/15 ≈ 0.07 30-31.9 1 1/15 ≈ 0.07

Here's what a frequency histogram would look like for these data: ## Technology

Here's a quick overview of how to create histograms for multi-valued discrete data or continuous data in StatCrunch.

1. Enter or import the data.
2. Select Graphics > Histogram
3. Select the column(s) you want to summarize and click Next.
4. Set the Type, lower class limit (Start bins at:), and class width (bin), then click Calculate. You can also go to the video page for links to see videos in either Quicktime or iPod format.

One final note about histograms: Because they show us such nice information about the distribution of a set of data, we'll be using them frequently throughout the rest of the semester. Be sure you spend plenty of time familiarizing yourself with the technology, so you're able to create histograms with ease.

## Stem-and-Leaf Plots

Stem-and-leaf plots are another way to represent quantitative data. They give more detail because they show the actual data. The idea is to split each data value into two parts - a stem and a leaf. The stem is everything of the right-most digit, and the leaf is that right-most digit. Here's an example, using the data from earlier this section regarding exam scores from a previous Mth120 class.

Example 6

 62 87 67 58 95 94 91 69 52 76 82 85 91 60 77 72 83 79 63 88 79 88 70 75 87

With these data, the stems are the first digits - 5, 6, 7, 8, and 9. The leafs are all the second digits, 0, 1, ... , 9. The full stem-and-leaf plot lists the stems down the left side, a vertical bar between, and then lists the leafs in order to the right. Something like this: It's interesting that this plot looks very similar to a histogram, only it gives us the actual data. Take a look at this animation to see the relationship:

There are some limitations to stem-and-leaf plots. In particular, we're limited to small data sets - can you imagine the leaves if we had 1,000 test scores? Also, the range in the data needs to be fairly small.

By that, I mean if the data values range from 1-100, our stems can be 0, 10, 20, ... , 90, as they were in this example. On the other hand, if the values range from 1-10,000, the stems would have to be 0, 10, 20, ... , 9,980, 9,990. That's a lot of rows!

## Technology

Here's a quick overview of how to create stem-and-leaf plots in StatCrunch.

 Enter or import the data. Select Graphics > Stem and Leaf Select the column you wish to use and click Create Graph!

## Dot Plots

Dot pots are similar to single-valued histograms, but rather than placing rectangles above each particular value, a dot plot just places the required number of dots above each value. Looking at our example again with the number of children, the plot would look something like this: ## Technology

Here's a quick overview of how to create dot plots in StatCrunch.

 Enter or import the data. Select Graphics > Dotplot. Select the column you wish to use and click Next. Set any options and click Create Graph!

## Distribution Shape

A good way to describe a distribution is its shape. In general, we describe a distribution's shape in one of four ways (though there are others):

1. uniform - frequencies are evenly spread out among all values of the variable
2. symmetric (bell-shaped) - highest value is in the middle, with values tailing off to the right and left
3. left (negative) skewed - highest value is on the right, with a longer left "tail"
4. right (positive) skewed - highest values is on the left, with a longer right "tail" uniform symmetric (bell-shaped) left (negative) skewed right (positive) skewed