Section 1.2: Observational Studies versus Designed Experiments

Objectives

By the end of this lesson, you will be able to...

distinguish between an observational study and a designed experiment
identify possible lurking variables
explain the various types of observational studies

For a quick overview of this section, watch this short video summary:

To begin, we're going to discuss some of the ways to collect data. In general, there are a few standards:

census
existing sources
survey sampling
designed experiments

Most of us associate the word census with the U.S. Census, but it actually has a broader definition. Here's typical definition:

A census is a list of all individuals in a population along with certain characteristics of each individual.

The nice part about a census is that it gives us all the information we want. Of course, it's usually impossible to get - imagine trying to interview every single ECC student. That'd be over 10,000 interviews!

So if we can't get a census, what do we do? A great source of data is other studies that have already been completed. If you're trying to answer a particular question, look to see if someone else has already collected data about that population. The moral of the story is this: Don't collect data that have already been collected!

Observational Studies versus Designed Experiments

Now to one of the main objectives for this section. Two other very common sources of data are observational studies and designed experiments. We're going to take some time here to describe them and distinguish between them - you'll be expected to be able to do the same in homework and on your first exam.

The easiest examples of observational studies are surveys. No attempt is made to influence anything - just ask questions and record the responses. By definition,

An observational study measures the characteristics of a population by studying individuals in a sample, but does not attempt to manipulate or influence the variables of interest.

For a good example, try visiting the Pew Research Center. Just click on any article and you'll see an example of an observational study. They just sample a particular group and ask them questions.

In contrast, designed experiments explicitly do attempt to influence results. They try to determine what affect a particular treatment has on an outcome.

A designed experiment applies a treatment to individuals (referred to as experimental units or subjects) and attempts to isolate the effects of the treatment on a response variable.

For a nice example of a designed experiment, check out this article from National Public Radio about the effect of exercise on fitness.

So let's look at a couple examples.

Example 1

Visit this link from Science Daily, from July 8th, 2008. It talks about the relationship between Post-Traumatic Stress Disorder (PTSD) and heart disease. After reading the article carefully, try to decide whether it was an observational study or a designed experiment

What was it?

This was a tricky one. It was actually an observational study. The key is that the researchers didn't force the veterans to have PTSD, they simply observed the rate of heart disease for those soldiers who have PTSD and the rate for those who do not.

Example 2

Visit this link from the Gallup Organization, from June 17th, 2008. It looks at what Americans' top concerns were at that point. Read carefully and think of the how the data were collected. Do you think this was an observational study or a designed experiment? Why?

Think carefully about which you think it was, and just as important - why? When you're ready, click the link below.

What was it?

If you were thinking that this was an observational study, you were right!The key here is that the individuals sampled were just asked what was important to them. The study didn't try to impose certain conditions on people for a set amount of time and see if those conditions affected their responses.

Example 3

This last example is regarding the "low-carb" Atkins diet, and how it compares with other diets. Read through this summary of a report in the New England Journal of Medicine and see if you can figure out whether it's an observational study or a designed experiment.

What was it?

As expected, this was a designed experiment, but do you know why? The key here is they forced individuals to maintain a certain diet, and then compared the participants' health at the end.

Probably the biggest difference between observational studies and designed experiments is the issue of association versus causation. Since observational studies don't control any variables, the results can only be associations. Because variables are controlled in a designed experiment, we can have conclusions of causation.

Look back over the three examples linked above and see if all three reported their results correctly. You'll often find articles in newspapers or online claiming one variable caused a certain response in another, when really all they had was an association from doing an observational study.

The discussion of the differences between observational studies and designed experiments may bring up an interesting question - why are we worried so much about the difference?

We already mentioned the key at the end of the previous page, but it bears repeating here:

Observational studies only allow us to claim association,not causation.

The primary reason behind this is something called a lurking variable (sometimes also termed a confounding factor, among other similar terms).

A lurking variable is a variable that affects both of the variables of interest, but is either not known or is not acknowledged.

Consider the following example, from The Washington Post:

Example 4

Coffee may have health benefits and may not pose health risks for many people

By Carolyn Butler Tuesday, December 22, 2009

Of all the relationships in my life, by far the most on-again, off-again has been with coffee: From that initial, tentative dalliance in college to a serious commitment during my first real reporting job to breaking up altogether when I got pregnant, only to fail miserably at quitting my daily latte the second time I was expecting. More recently the relationship has turned into full-blown obsession and, ironically, I often fall asleep at night dreaming of the delicious, satisfying cup of joe that awaits, come morning.

[...] Rest assured: Not only has current research shown that moderate coffee consumption isn't likely to hurt you, it may actually have significant health benefits. "Coffee is generally associated with a less health-conscious lifestyle -- people who don't sleep much, drink coffee, smoke, drink alcohol," explains Rob van Dam, an assistant professor in the departments of nutrition and epidemiology at the Harvard School of Public Health. He points out that early studies failed to account for such issues and thus found a link between drinking coffee and such conditions as heart disease and cancer, a link that has contributed to java's lingering bad rep. "But as more studies have been conducted-- larger and better studies that controlled for healthy lifestyle issues --the totality of efforts suggests that coffee is a good beverage choice."

[...]

Source: Washington Post

What is this article telling us? If you look at the parts in bold, you can see that Professor van Dam is describing a lurking variable: lifestyle. In past studies, this variable wasn't accounted for. Researchers in the past saw the relationship between coffee and heart disease, and came to the conclusion that the coffee was causing the heart disease.

But since those were only observational studies, the researchers could only claim an association. In that example, the lifestyle choices of individuals was affecting both their coffee use and other risks leading to heart disease. So "lifestyle" would be an example of a lurking variable in that example.

For more on lurking variables, check out this link from The Math Forum and this one from The Psychology Wiki. Both give further examples and illustrations.

With all the problems of lurking variables, there are many good reasons to do an observational study. For one, a designed experiment may be impractical or even unethical (imagine a designed experiment regarding the risks of smoking). Observational studies also tend to cost much less than designed experiments, and it's often possible to obtain a much larger data set than you would with a designed experiment. Still, it's always important to remember the difference in what we can claim as a result of observational studies versus designed experiments.

Types of Observational Studies

There are three major types of observational studies, and they're listed in your text: cross-sectional studies, case-control studies, and cohort studies.

Cross-sectional Studies

This first type of observational study involves collecting data about individuals at a certain point in time. A researcher concerned about the effect of working with asbestos might compare the cancer rate of those who work with asbestos versus those who do not.

Cross-sectional studies are cheap and easy to do, but they don't give very strong results. In our quick example, we can't be sure that those working with asbestos who don't report cancer won't eventually develop it. This type of study only gives a bit of the picture, so it is rarely used by itself. Researchers tend to use a cross-sectional study to first determine if their might be a link, and then later do another study (like one of the following) to further investigate.

Case-control Studies

Case-control studies are frequently used in the medical community to compare individuals with a particular characteristic (this group is the case)with individuals who do not have that characteristic (this group is the control). Researchers attempt to select homogeneous groups, so that on average, all other characteristics of the individuals will be similar, with only the characteristic in question differing.

One of the most famous examples of this type of study is the early research on the link between smoking and lung cancer in the United Kingdom by Richard Doll and A. Bradford Hill. In the 1950's, almost 80% of adults in the UK were smokers, and the connection between smoking and lung cancer had not yet been established. Doll and Hill interviewed about 700 lung cancer patients to try to determine a possible cause.

This type of study is retrospective,because it asks the individuals to look back and describe their habits(regarding smoking, in this case). There are clear weaknesses in a study like this, because it expects individuals to not only have an accurate memory, but also to respond honestly. (Think about a study concerning drug use and cognitive impairment.) Not only that, we discussed previously that such a study may prove association, but it cannot prove causation.

Cohort Studies

A cohort describes a group of individuals, and so a cohort study is one in which a group of individuals is selected to participate in a study. The group is then observed over a period of time to determine if particular characteristics affect a response variable.

Based on their earlier research, Doll and Hill began one of the largest cohort studies in 1951. The study was again regarding the link between smoking and lung cancer. The study began with 34,439 male British doctors, and followed them for over 50 years. Doll and Hill first reported findings in 1954 in the British Medical Journal, and then continued to report their findings periodically afterward. Their last report was in 2004, again published in the British Medical Journal. This last report reflected on 50 years of observational data from the cohort.

This last type of study is called prospective, because it begins with the group and then collects data over time. Cohort studies are definitely the most powerful of the observational studies,particularly with the quantity and quality of data in a study like the previous one.

Let's look at some examples.

Example 4

A recent article in the BBC News Health section described a study concerning dementia and "mid-life ills". According to the article, researches followed more than 11,000 people over a period of 12-14 years. They found that smoking, diabetes, and high blood pressure were all factors in the onset of dementia.

What type of observational study was this? Cross-sectional, case-control,or cohort?

What was it?

Because the researchers tracked the 11,000 participants, this is a cohort study.

Example 5

In 1993, the National Institute of Environmental Health Sciences funded a study in Iowa regarding the possible relationship between radon levels and the incidence of cancer. The study gathered information from 413 participants who had developed lung cancer and compared those results with 614 participants who did not have lung cancer.

What type of study was this?

What was it?

This study was retrospective - gathering information about the group of interest (those with cancer) and comparing them with a control group(those without cancer). This is an example of a case-control study.

Thought his may seem similar to a cross-sectional study, it differs in that the individuals are "matched" (with cancer vs. without cancer)and the individuals are expected to look back in time and describe their time spent in the home to determine their radon exposure.

Example 6

In 2004, researchers published an article in the New England Journal of Medicine regarding the relationship between the mental health of soldiers exposed to combat stress. The study collected information from soldiers in four combat infantry units either before their deployment to Iraq or three to four months after their return from combat duty.

What type of study was this?

What was it?

Since this was simply a survey given over a short period of time to try to examine the effect of combat duty, this was a cross-sectional study. Unlike the previous example, it did not ask the participants to delve into their history, nor did it explicitly "match" soldiers with a particular characteristic.