Skip to main content Skip to main content

Private: Learning Math: Data Analysis, Statistics, and Probability

Designing Experiments Part C: Comparative Experimental Studies (65 minutes)

In This Part: Measuring Short-Term Recall

Now we’ll move on to an investigation of comparative experimental studies. We’ll begin with a problem related to human memory and, in particular, short-term recall. See Note 4 below.

Ask a Question
Is it easier to remember strings of characters that spell familiar words than to remember arbitrary strings of characters?

Which list do you think would be easier for most people to remember — words like those in List A, or character strings (non-words) like those in List B?

 

 

 

 

 

 

 

 

 

2. Collect Appropriate Data
In order to explore this question, you’ll need appropriate data — and to get this data, you’ll need a method of measurement. See Note 5 below.


Problem C1
Describe the methods you might use to measure a person’s ability to recall words and to recall non-words. Your description should be as specific as possible and include enough detail so that other people could follow your instructions well enough to perform the measurement themselves. Keep in mind that you want to use the results to make a fair comparison of the two sets of data you collect.


Problem C2
Consider your answer to Problem C1. Are there any ways in which the measurement process you described is biased? If so, think about ways you might try to remove this bias.

 Remember, bias can come in many forms, including biases of the people conducting a measurement and biases of the people whose ability to recall words is being measured.


In This Part: An Experiment
Here is one way to do the measurement.

Two lists are used:
1.  List A contains 20 words of three characters each.
2. List B contains 20 non-words of three characters each.

To measure a person’s ability to remember words (or non-words), each subject follows these steps:
1. Study the list for two minutes.
2. Pause for 20 seconds.
3. In one minute, list all the words or non-words that the subject can recall.

In this version of the experiment, we will show the list of non-words (List B) before the list of words. The subject’s final score is the number of words (or non-words) he or she was able to correctly recall.

To perform a non-interactive version of this activity, you’ll need a subject and a timekeeper.
• Show the subject the list of non-words (List B). He or she has two minutes to study the list.
• When two minutes have elapsed, take the list of non-words away from the subject and wait for 20 seconds.
• The subject then has one minute to recall and write down non-words from the list.
• Finally, count the number of correct non-words the subject recalled.

Repeat the same sequence for words (List A).

 

 

 

 

 

 

 

 


Problem C3
Think about the experiment design you described in Problem C1. Does the process you just went through (studying non-words, then writing non-words; studying words, then writing words) present any additional biases? How might you resolve any biases in your experiment design?

If you gave someone the same task twice, would you expect them to perform better, worse, or equally well the second time? If you did not answer “equally well,” then that presents a source of bias for this experiment.


In This Part: Experimental Design
You would not want to make conclusions about memory by examining only one person. Therefore, you should use more than one subject in this experiment.

Let’s assume that you will use 16 subjects in your experiment. You will need to make some decisions about how to measure short-term recall in your 16 subjects.

Recall the original question: “Is it easier to remember strings of characters that spell familiar words than to remember arbitrary strings of characters?”

As stated, the question is perhaps not as specific as it should be. For example, we have not clarified the population of people we are studying. Age may have an impact on a person’s ability to memorize. Are we interested in adults only?

If you believe that age makes a difference in a person’s recall ability, then perhaps you need to refine the question.

Below are five different ways that you might assign the 16 subjects to groups and collect your measurements. These are referred to as designs. You should consider only one design at a time; do not move on to Design 2 until you have answered all four questions for Design 1, and so on. Your focus should be on the methods of data collection for each design.


Problem C4
For each design, answer the following questions:
a. What are the strengths of the design?
b. What are the weaknesses of the design?
c. Do you have concerns about bias that are not addressed by the design?
d. Do you have suggestions for improving the design?

Design 1
• 
Divide the 16 subjects into two groups of eight. Ask for volunteers to form the first group.
• 
When the two groups have been formed, measure each of the eight subjects in Group 1 using List A; measure each of the subjects in Group 2 using List B.
• 
Compare the results (the eight measurements) for Group 1 with the results for Group 2.

Design 2
Measure each of the 16 subjects using List A. Then measure each of the 16 subjects using List B.
• 
Compare the results (all 16 measurements) for List A with the results for List B.

Design 3
• 
Divide the 16 subjects into two groups, and assign eight subjects to each group.
• 
When the two groups have been formed, measure each of the eight subjects in Group 1 using List A; measure each of the subjects in Group 2 using List B.
• 
Compare the results (the eight measurements) for Group 1 with the results for Group 2.

Design 4
• 
Divide the 16 subjects into two groups of eight. Assign eight subjects to each group.
• 
When the two groups have been formed, measure each of the eight subjects in Group 1 using List A. Then measure each of the subjects in Group 1 using List B.
• 
Measure each of the subjects in Group 2 using List B. Then measure each of the subjects in Group 2 using List A.
• 
Compare the results (all 16 measurements) for Group 1 with the results (all 16 measurements) for Group 2.

Design 5
• 
Divide the 16 subjects into two groups of eight. Randomly assign eight subjects to each group.
• 
When the two groups have been formed, measure each of the eight subjects in Group 1 using List A. Then measure each of the subjects in Group 1 using List B.
• 
Measure each of the subjects in Group 2 using List B. Then measure each of the subjects in Group 2 using List A.
• 
Compare the results (all 16 measurements) for Group 1 with the results (all 16 measurements) for Group 2.


Problem C5
Of the five designs in Problem C4, which do you think does the best job of eliminating potential sources of bias? Does this design eliminate all potential sources of bias, or are there further possible improvements?

Video Segment
In this video segment, participants compare Designs 4 and 5. Watch this segment after completing Problem C5.

Why do participants decide that Design 5 is better than Design 4?

 


You may say that of the five designs, the best uses random assignment to divide the subjects into two groups. Half of the subjects see List A first and half see List B first, and each subject ultimately sees both lists. When measurements are paired in this way (i.e., each person reads both lists), we are able to compare two different measurements for each subject. Without pairing, we must compare measurements for different people, and the differences in the people themselves may affect the difference in the measurements. Pairing will prove especially useful when we analyze our data. See Note 6 below.


Problem C6
a. 
Describe a method you might use to randomly assign the 16 subjects to two groups of eight subjects each.
b. 
Why would it be preferable to have each subject read both lists, instead of having eight subjects read List A and eight subjects read List B?

You need to make sure that the selection process is random and that the person(s) conducting the experiment does not have any influence on dividing the subjects into groups. Regarding the second question, what would happen if the eight subjects randomly chosen for List B were unusually smart?

Video Segment
In this video segment, researchers at the Brigham and Women’s Hospital discuss the design of a study on the effects of aspirin. Watch this segment after completing Problem C6.

How was the study on aspirin designed? What characteristics of the study’s design are most important in eliminating bias?


In This Part: Analysis of the Experiment

3. Analyze the Data
Sixteen subjects participated in the memory experiment using Design 5. Eight subjects were randomly assigned to each group. Each subject in Group 1 was first measured using List A. Then each subject in Group 1 was measured using List B. The order was reversed for Group 2: The subjects were first measured using List B, then List A. Here are the measurements from the experiment:

 

 

 

 

 

 

 

 

 

Because every subject was measured using both List A and List B, it is of interest to look at the differences in the subjects’ scores (Score A – Score B), which are shown in the last column.


Problem C7
a. 
Determine the Five-Number Summary for the 16 scores using List A.
b. 
Determine the Five-Number Summary for the 16 scores using List B.
c. 
Using the same scale, sketch the two box plots for List A and List B side by side.
d. 
Based on the summaries and box plots, how do scores using List A compare with scores using List B?


The comparison of the two box plots indicates that the scores of subjects using List A tend to be higher than the scores of subjects using List B; therefore, subjects recalled words more readily than they recalled non-words. The difference in the two medians (10 – 5 = 5) indicates that people can typically recall five more words than non-words. However, there is somewhat more variation in the scores from List A (words) than from List B (non-words); the range for List A is 12 compared to 8 for List B, and the IQR for List A is 3 compared to 2 for List B.

Video Segment
In this video segment, Professor Kader and participants display the results of the study on short-term recall on box plots and discuss what they see. Watch this segment after completing Problem C7.

Why are box plots helpful in studying the results of comparative experiments?

 


This comparative analysis does not take into account the advantage you gain from pairing each subject’s scores from both lists and then examining the difference between the two scores (Score A – Score B). A positive difference occurs when the List A score is higher than the List B score; that is, the difference is positive when memory recall is better for words than for non-words. See Note 7 below.


Problem C8
Examine the column of differences in the Number Recalled Correctly table:
a. How many of the differences are positive? How many are negative?
b. What does this suggest about memory recall of words versus non-words?
c. Determine the Five-Number Summary of the 16 differences.
d. Sketch the box plot of the differences.
e. Based on the Five-Number Summary and box plot of the differences in scores between the two lists, how do scores using List A compare with scores using List B?

Notes

Note 4
The activity in Part C provides an opportunity to consider the full statistical problem-solving process, with the primary focus on collecting the data. Useful data depend on an appropriate measurement/collection design, and you will have the opportunity to devise your own method of measurement. The activity asks you to consider several design options, and, in doing so, you will investigate such important ideas as randomization, blocking (pairing), and the effects of order.

After completing the memory experiment in Part C, consider conducting your own memory experiment to determine whether your results confirm the results in our example. One option is to use the Interactive Activity in Session 1, Part C, and reexamine the distance perception phenomenon. A paired design similar to the one used for the memory experiment could be used to compare subjects’ length perception of line segments with arrows to their length perception of line segments without arrows.

Fathom Software, used by the onscreen participants, is helpful in creating graphical representations of data. You can use Fathom Software to complete Problems C7-C8, as well as Problem H1. For more information, go to the Key Curriculum Press Web site at http://www.keypress.com/fathom/.

Note 5
Measurement is the most important part of the statistics problem-solving process. Poor measurement will certainly produce poor conclusions! Most introductory statistics books or courses do not put a major emphasis on measurement. In this course, we encourage you to take some time to focus on measurement; this activity is good place to do so.

Note 6
Groups or individuals working alone should discuss or reflect on the important ideas and issues of experimental design:
• Using volunteers for subjects or personally selecting subjects might bias results.
• The order in which tasks are performed may affect the outcomes of the tasks.
• Pairing in data collection can reduce variation in measurements. In this case, each person is paired with him- or herself. This enables us to eliminate the differences in memory recall caused by the individual differences among subjects from any observed difference.
• Random selection or assignment is intended to remove bias. Also, randomly assigning subjects to two groups is intended to average out their differences so that the two groups are more likely to be similar.

Note 7
This is an informal analysis of results. Formal probability-based inferences were not considered. However, the issues of generalizing any results to a larger population need to be raised. What is the population? How general are the results? A more advanced analysis is required to make an inference about any larger population that our results might represent.

Solutions

Problem C1
Answers will vary, but here is one possible method of collection:
1. Each subject is given a specific amount of time to study List A — the list of words.
2. After a timed delay, each subject is given a specific amount of time to write down all the words the subject can recall from the list.
3. The subject is scored based on the number of words correctly recalled.

Repeat this process using List B — the non-words.

Because you want to use the results to make a comparison, you want the lists to be comparable. The two lists should contain the same number of character strings, and the character strings should be the same size. One way to do this is for both lists to contain “words” that are all the same length (say, four letters each); another is for both lists to contain “words” that are respectively the same lengths (say, the first “word” on each list is six letters, the next “words” are three letters, and so on).

Problem C2
Yes, the process may be biased in the way the “non-words” are created — the person creating non-words may deliberately choose letter combinations that are exceptionally difficult to remember. Also, someone asked to perform the task twice, with List A and then List B, may perform better with List B simply because he or she has practiced the task by using List A first.

Problem C3
Answers will vary. One source of bias is that if people are all asked to try List A first, then List B, there may be bias introduced since the participants are performing the same task a second time.

Problem C4

Design 1
a. 
One strength is that an equal number of participants form each group.
b. A major weakness is the selection of the groups; volunteers may be particularly eager to test  their memories. Another weakness is that no participant takes both tests, so there is no possibility of directly comparing the results of the two lists for the same participant.
c. The selection of the participants is a large source of bias and is not addressed by the design.
d. A better design would randomly assign participants to each group and would take measurements for each list for all 16 participants.

Design 2
a. 
An equal number of participants uses each list, and all participants take both tests, allowing for more direct comparison.
b. 
A major weakness is that all participants use List A first. By using List B second, they may perform better (or worse) simply due to their prior experience from the first test, and not due to actual differences in the tests themselves.
c. 
A potential source of bias is the possibility that List B becomes easier or harder as a result of List A being used first.
d. 
A better design would randomly select half the participants to use List A first and half to use List B first.

Design 3
a. 
Each group has an equal number of participants, and participants do not determine their own groups.
b. 
A major weakness is the selection of the groups. The person conducting the study may deliberately place certain types of people in a group, either to deliberately skew the results or due to unconscious bias. Another weakness is that no participant takes both tests, so there is no possibility of directly comparing the results of the two lists for the same participant.
c. 
The fact that the person conducting the study selects the groups is an unaddressed source of bias.
d. 
A better design would randomly assign participants to each group and would take measurements for each list for all 16 participants.

Design 4
a. 
All participants use both lists, and an equal number of participants uses each list.
b. 
The major weakness is the selection of the groups. The person conducting the study may deliberately place certain types of people in a group, either to deliberately skew the results or because of unconscious bias.
c. 
The fact that the person conducting the study selects the groups is an unaddressed source of bias.
d. 
A better design would randomly assign participants to each group.

Design 5
a. 
Groups are randomly assigned; all participants use both lists; and an equal number of participants uses each list first.
b. 
There do not appear to be any major weaknesses.
c.
There are no major sources of bias.
d. 
A better design might include more participants to increase the relevance and confidence of the findings.

Problem C5
Design 5 does the best job of removing bias. There are still small possible sources of bias, the most apparent of which is the method of creating the two lists. A specific way of randomly generating List B would be useful, as might a specific method of finding the words used for List A.

Problem C6
a. 
There are many ways to randomly assign the 16 subjects to groups. One way is to place the 16 names on equal-size slips of paper, then draw eight of these slips from a hat. Another is to start with a list of last names of the 16 participants arranged in alphabetical order, then flip a coin for each individual. If the coin lands heads, the participant is assigned to Group 1, and if the coin lands tails, the participant is assigned to Group 2. This continues until eight subjects are assigned to a group; the remaining subjects are then assigned to the other group.
b. 
 If different groups read each list, variation in the data may come from randomly picking only the best people to read one list. If each person reads both lists, this potential source of variation and bias is removed completely.

Problem C7
a. and b.

 

 


c. 
Here are the box plots:

 

 

 

 

 

 

 

 

 

d. Note that all measures of location (Min, Q1, Med, Q3, and Max) are higher for List A (Words) than for List B (Non-Words). The median for List A is twice the median for List B. In other words, people typically remembered twice as many “words” from List A as from List B. However, there is more variation in the number recalled correctly for List A than for List B. The range for List A is 12 (from 3 to 15), while the range for List B is 8 (from 1 to 9). The interquartile ranges for the two lists (3 and 2) are roughly equal.

One telling statistic is that the median of List A (10) is higher than the maximum of List B (9). This means that more than 50% of people scored higher on List A than anyone scored on List B.

Problem C8
One telling statistic is that the median of List A (10) is higher than the maximum of List B (9). This means that more than 50% of people scored higher on List A than anyone scored on List B.
a. Fifteen of the 16 differences are positive. Only one is negative.
b. 
Since only one person had better recall using the list of non-words, and 15 others had better recall using the list of words, this suggests that words are significantly easier to recall than non-words.
c. Here is the Five-Number Summary of the differences:

 

 

d. Here is the box plot of the differences:

 

 

 

 

 

 

 

 

 

e. These results indicate that people are better at recalling words than non-words. Note that the entire interquartile range (the “box”) is above the axis, which indicates that all of the center 50% of participants performed better with the list of words.

Series Directory

Private: Learning Math: Data Analysis, Statistics, and Probability

Credits

Produced by WGBH Educational Foundation. 2001.
  • Closed Captioning
  • ISBN: 1-57680-481-X

Sessions