Section 3.1: Measures of Central Tendency

Objectives

By the end of this lesson, you will be able to...

determine the arithmetic mean of a variable from raw data
determine the median of a variable from raw data
explain what it means for a statistics to be resistant
determine the mode of a variable from raw data
use the mean and median to help identify the shape of a distribution

It's often very helpful to get a sense of what a "typical" individual might be in a population. This is what we mean when we say we're looking at measures of "center" or "central tendency".

For a quick overview of this section, watch this short video summary:

Before we get into specifics, we need to clarify whether we're talking about typical individual from a population or from a sample.

A parameter is a descriptive measure of a population.
A statistic is a descriptive measure of a sample.

Arithmetic Mean

You already know the arithmetic mean, though maybe not by name. It's more commonly referred to as the "average". It's calculated just by finding the some of the values and dividing by the number of observations. As mentioned above, we'll have two different means - one for the population and one if we're talking about a sample.

The population arithmetic mean, μ (pronounced "mew"), is computed using all the individuals in the population. The sample arithmetic mean, (pronounced "x-bar"), is computed using sample data.

population arithmetic mean:	μ =	x₁ + x + ... + x_N	=	Σx_i
		N		N

sample arithmetic mean:	=	x₁ + x + ... + x_n	=	Σx_i
		n		n

This is pretty formulaic, but the concept should be relatively familiar.

To give another explanation, I'm going to reference one of my favorite web sites, BetterExplained. The author, Kalid Azad, presents topics in a non-traditional way, and I feel it's much more accessible and easier to understand than traditional texts. Here's what Kalid writes about the arithmetic mean:

The Arithmetic Mean

The arithmetic mean is the most common type of average:

$\displaystyle{average = \frac{sum}{number} }$

Arithmetic mean

Let’s say you weigh 150 lbs, and are in an elevator with a 100lb kid and 350lb walrus. What’s the average weight?

The real question is “If you replaced this merry group with 3 identical people and want the same load in the elevator, what should each clone weigh?”

In this case, we’d swap in three people weighing 200 lbs each [(150 + 100 + 350)/3], and nobody would be the wiser.

Pros:

It works well for lists that are simply combined (added) together.
Easy to calculate: just add and divide.
It’s intuitive — it’s the number “in the middle”, pulled up by large values and brought down by smaller ones.

Cons:

The average can be skewed by outliers — it doesn’t deal well with wildly varying samples. The average of 100, 200 and -300 is 0, which is misleading.

The arithmetic mean works great 80% of the time; many quantities are added together. Unfortunately, there’s always those 20% of situations where the average doesn’t quite fit.

Source: BetterExplained, Kalid Azad
Article: How to Analyze Data Using the Average
Used with permission.

So, let's try an example.

Example 1

Suppose we record the exam scores from a sample of six students from a class of 30 (see table below).

student	exam score
Joseph	62
Alicia	83
Kendra	77
Cheryl	92
Adrian	89
Brian	75

Find the sample mean, along with its appropriate symbol.

[ reveal answer ]

Since this is a sample, the appropriate symbol is , with a value of:

=	62 + 83 + 77 + 92 + 89 + 75	=	478	≈79.7
	6		6

You may notice that I rounded the mean to the tenths place.

Typically, we round the mean to one more digit than the original data.

When necessary, round the mean to one more digit than the original data. i.e. If the data are whole numbers, you should round the mean to the tenths place (as in the previous example). If the data are already to the tenths place, you should round to the hundredths place.

You might also consider watching this video regarding rounding (in Quicktime or iPod format).

Example 2

One point that should be emphasized again is the effect of outliers on the arithmetic mean. Because it adds all the values together, the arithmetic mean can be skewed by extremely large or extremely small values.

A helpful way to illustrate this is to think of the mean as the center of gravity - like the balance point. Suppose we consider the ages of the six Jackson cousins, Hudson, Abella, Amelia, Jillian, Katelyn, and Jessica. The figure below represents their ages and the corresponding sample mean. (Sample, in this case, because this isn't all of the Jackson cousins.)

the mean as a center of mass

If we replace Jessica with her father, who is 34 years old, we get something like this:

the sample mean with an outlier

You can see very clearly here the effect of including the dad. 16 years old does not really represent the "middle" value.

Technology

Here's a quick overview of the formulas for finding arithmetic mean in StatCrunch.

Select Stat > Summary Stat > Columns.
Select the variable you want to summarize (e.g., "Heights")--leave everything else as is for now.
Click "Next".
Deselect any statistics that you do not want calculated.
Click "Calculate" and another window with these numbers calculated will pop up.

You can also visit the video page for links to see videos in either Quicktime or iPod format.

Median

As we mentioned at the end of the previous page, we need another measure of center when the data include outliers. The most common choice is called the median.

The median of a variable is the value that lies in the middle of the data when arranged in ascending order. That is, half the data are below the median and half the data are above the median. We use M to represent the median.

Like the previous topic, I really appreciate how Kalid Azad explained the median on his web site, BetterExplained. Here's what he wrote:

Median

The median is “the item in the middle”. But doesn’t the average (arithmetic mean) imply the same thing? What gives?

Humor me for a second: what’s the “middle” of these numbers?

1, 2, 3, 4, 100

Well, 3 is the middle of the list. And although the average (22) is somewhere in the “middle”, 22 doesn’t really represent the distribution. We’re more likely to get a number closer to 3 than to 22. The average has been pulled up by 100, an outlier.

The median solves this problem by taking the number in the middle of a sorted list. If there’s two middle numbers (even number of items), just take their average. Outliers like 100 only tug the median along one item in the sorted list, instead of making a drastic change: the median of 1 2 3 4 is 2.5.

Median

Pros:

Handles outliers well — often the most accurate representation of a group
Splits data into two groups, each with the same number of items

Cons:

Can be harder to calculate: you need to sort the list first
Not as well-known; when you say “median”, people may think you mean “average”

Some jokes run along the lines of “Half of all drivers are below average. Scary, isn’t it?”. But really, in your head, you know they should be saying “half of all drivers are below median“.

Figures like housing prices and incomes are often given in terms of the median, since we want an idea of the middle of the pack. Bill Gates earning a few billion extra one year might bump up the average income, but it isn’t relevant to how a regular person’s wage changed. We aren’t interested in “adding” incomes or house prices together — we just want to find the middle one.

Again, the type of average to use depends on how the data is used.

Source: BetterExplained, Kalid Azad
Article: How to Analyze Data Using the Average
Used with permission.

Example 3

Let's again consider the exam scores from a sample of six students from a class of 30 (see table below).

student	exam score
Joseph	62
Alicia	83
Kendra	77
Cheryl	92
Adrian	89
Brian	75

Find the sample median.

[ reveal answer ]

In order to find the median, we first need to write the values in order: 62, 75, 77, 83, 89, 92. Since we have six observations, there is no exact "middle" on, so we have to average the middle two.

The median is then (77 + 83)/2 = 80.

Example 4

To illustrate how the median deals with outliers, let's again consider the ages of the six Jackson cousins. The figure below represents their ages and the corresponding sample median.

the sample median

If we replace Jessica with her father, who is 34 years old, we get something like this:

the median with an outlier

You can immediately see the benefit of using the median - it is not affected by the age of Jessica's father.

Technology

Here's a quick overview of the formulas for finding median in StatCrunch.

Select Stat > Summary Stat > Columns.
Select the variable you want to summarize (e.g., "Heights")--leave everything else as is for now.
Click "Next".
Deselect any statistics that you do not want calculated.
Click "Calculate" and another window with these numbers calculated will pop up.

You can also visit the video page for links to see videos in either Quicktime or iPod format.

Mode

Often, we just want to know what "most" people think on an issue. We don't call it that, but we're really looking at is called the mode.

The mode of a variable is the most frequent observation of the variable.

Look at any poll from the Pew Research Center. Any time an article discusses the "most common" or "most popular" choice, it's talking about the mode.

As with the previous two measures of central tendency, I like Kalid Azad's explanation of the mode on his web site, BetterExplained. Here's what he wrote:

Mode

The mode sounds strange, but it just means take a vote. And sometimes a vote, not a calculation, is the best way to get a representative sample of what people want.

Let’s say you’re throwing a party and need to pick a day (1 is Monday and 7 is Sunday). The “best” day would be the option that satisfies the most people: an average may not make sense. (“Bob likes Friday and Alice likes Sunday? Saturday it is!”).

Similarly, colors, movie preferences and much more can be measured with numbers. But again, the ideal choice may be the mode, not the average: the “average” color or “average” movie could be… unsatisfactory (Rambo meets Pride and Prejudice).

Mode

Pros:

Works well for exclusive voting situations (this choice or that one; no compromise)
Gives a choice that the most people wanted (whereas the average can give a choice that nobody wanted).
Simple to understand

Cons:

Requires more effort to compute (have to tally up the votes)
“Winner takes all” — there’s no middle path

The term “mode” isn’t that common, but now you know what button to look for when playing around with your favorite statistics program.

Source: BetterExplained, Kalid Azad
Article: How to Analyze Data Using the Average
Used with permission.

Technology

To see how to find the mode using technology, open the appropriate video from the list below. These videos include all measures of center included in this section, plus other descriptive statistics.

Visit the video page for links to see videos in either Quicktime or iPod format.

Using the Mean and Median to Identify the Distribution Shape

In Section 2.2, we talked about different ways to describe the distribution shape. With these new measures of center, we can now use the mean and median to get an idea of the distribution shape as well.

left-skewed

right-skewed

symmetric (bell-shaped)