Section 2.1: Organizing Qualitative Data
Objectives
By the end of this section, you will be able to...
 organize qualitative data in tables
 construct bar graphs
 construct pie charts
For a quick overview of this section, feel free to watch this short video summary:
Frequency and Relative Frequency Tables
Let's suppose you give a survey concerning favorite color, and the data you collect looks something like the table below.
blue 
red  blue  orange  blue  yellow  green  red  pink 
blue  green  blue  purple  blue  blue  green  yellow  pink 
blue  red  pink  green  blue  yellow  green  blue 
Clearly, we need a better way to summarize the data. The most obvious thing to do would be to make a table with the list of favorite colors and the frequency for each.
favorite color  frequency 
blue  10 
red  3 
orange  1 
yellow  3 
green  5 
pink  3 
purple  1 
Officially, we call this a frequency distribution.
A frequency distribution lists each category of data and the number of occurrences for each category.
Sometimes, we really want to know the frequency of a particular category in reference to the total. We can do this just by finding the total, and dividing the frequency for each category by that total.
The relative frequency is the proportion (or percent) of observations within a category and is found using the formula
relative frequency =  frequency 
sum of all frequencies 
A relative frequency distribution lists each category of data together with the relative frequency of each category.
favorite color  relative frequency 
blue  10/26 ≈ 0.38 
red  3/26 ≈ 0.12 
orange  1/26 ≈ 0.04 
yellow  3/26 ≈ 0.12 
green  5/26 ≈ 0.19 
pink  3/26 ≈ 0.12 
purple  1/26 ≈ 0.04 
Technology
Here's a quick overview of how to create frequency and relative frequency tables in StatCrunch.

Bar Graphs
Bar graphs are probably the most commonly used graphs, and one you're already familiar with. I won't mention much more here, except to state a couple keys:
 heights can be frequency or relative frequency
 bars must not touch
Using our the data from our previous color example,
favorite color  frequency  relative frequency 
blue  10  10/26 ≈ 0.38 
red  3  3/26 ≈ 0.12 
orange  1  1/26 ≈ 0.04 
yellow  3  3/26 ≈ 0.12 
green  5  5/26 ≈ 0.19 
pink  3  3/26 ≈ 0.12 
purple  1  1/26 ≈ 0.04 
we could then make both frequency and relative frequency bar graphs.
Technology
Here's a quick overview of how to create bar graphs in StatCrunch.

Pareto Charts
A Pareto chart is a bar graph whose bars are drawn in decreasing order of frequency or relative frequency.
You see Pareto charts fairly often in the newspaper, because often the article is trying to show that one particular category is the highest or lowest. The image below, for example, is from the Chicago Tribune. You can see clearly from the graph that it's attempting to show that the local BP refinery in Whiting, Indiana is the highestcapacity refinery that is considering expansion.
If you don't remember the issue, you can read up about BP's plan to expand it's refinery in this article from CBS2 Chicago.
Here's another one, using the favorite color data from the last section:
SidebySide Bar Graphs
Sidebyside bar graphs are used when you want to compare two different populations. The key with sidebyside bar graphs is that you must use relative frequencies. Do you know why?
I think so. But just in case...
Look at it this way: Let's suppose we want to compare the poverty levels
for different cities in Illinois. If we used frequencies only, Cook county
dominates  almost 800,000, where no other county has over 50,000. On
the other hand, if we looked at relative frequency, Cook county still
has the most (15%), but other counties such as Kane are close, with rates
around 8%.
Source: 2007
Illinois Poverty Summit
Here's a good example of a sidebyside chart, from the Associated Press.
What's shown isn't quite a relative frequency as we've defined it  it's the number per 100,000, where ours as a percent is the number per 100. The reason why the rate per 100,000 is used here is because the percents would all be less than 1% and difficult to read. Still, if frequency was used instead, the "White" category would be the largest, simply because that's the largest segment of the U.S. population.
Technology
Here's a quick overview of how to create sidebyside bar graphs in StatCrunch.

Pie Charts
Like bar graphs, pie charts are very common. You're probably already aware of these as well. I'll just include a couple comments:
 should always include the relative frequency
 also should include labels, either directly or as a legend
Using our the data from our previous color example,
favorite color  frequency  relative frequency 
blue  10  10/26 ≈ 0.38 
red  3  3/26 ≈ 0.12 
orange  1  1/26 ≈ 0.04 
yellow  3  3/26 ≈ 0.12 
green  5  5/26 ≈ 0.19 
pink  3  3/26 ≈ 0.12 
purple  1  1/26 ≈ 0.04 
we get this pie chart:.
Technology
Here's a quick overview of how to create pie charts in StatCrunch.
