During my brief tenure as a high school teacher, one common suggestion I got from supportive colleagues was to “make your tests teaching tools.” “That’s often the only time you’ve really got your students’ attention,” they suggested, “so don’t neglect the opportunity to teach them something.”
What they meant is that you shouldn’t use misleading or false information in tests as a “trick” to make sure they grasp the material: your test might be the only thing students remember from a unit.
But there’s another reason testing is important for learning. For decades researchers have known that more is often learned during testing than traditional “learning.” If, for example, students must learn 20 spelling words for a test, in many situations they’ll remember the 10 words they were *actually* tested on better than the others.
If I quiz Jim on his Spanish vocabulary words every day, he does better on tests than if he studies on his own. This might be more of a reflection of the quality of his study time than a testing effect, but it still demonstrates the power of testing in aiding learning.
But how exactly does the testing effect work?
One hypothesis, supported by several studies, suggests that we learn from the specific cues in the practice test. If the final test uses similar cues, then we’ll do better. If the practice test is multiple choice, then we’ll do better on a multiple-choice final than a fill-in-the-blank test.
But one experiment from a 1989 study by John Glover found a different result. When the practice test was a free-retrieval test (participants were asked to recall as many ideas as possible from a reading passage), they did better on the final, no matter what type of test they were given.
Recently, Shana Carpenter and Edward DeLosh took a more systematic look at this phenomenon. They asked psychology students to try to memorize sets of 8 words by studying them one at a time for 3 seconds each. After being distracted by a brief math problem, they were tested in one of three ways, or given a chance to study the words again. After repeating this 12 times, they were tested again the entire set of 96 words. Here are the results:
As before, in the free recall, students just listed as many words as they could on a blank sheet of paper. In the cued recall test, the students saw the first letter of the word and had to fill in the rest. In the recognition test, they had to circle the correct 8 words from a list of 16. As you can see, no type of practice test — including recognition — led to significant improvement on the recognition test final. However, for both free recall and cued recall finals, the free recall practice test offered the best results (however, the free recall practice test was no better than the control group on the free recall final). Though the results aren’t crystal clear, they certainly don’t support the notion that taking a similar sort of practice test always leads to better results on the final exam.
Instead, Carpenter and DeLosh speculate that more elaborate retrieval processes during a practice test lead to better results on a final test. In other words, the more a test-taker relies on her own wits to generate the answers for a practice test, the better she’ll do on the final. To test this notion, they developed a new experiment. As before, students memorized sets of 8 words, but this time, participants were given 1, 2, 3, or 4-letter cues during the practice test. (A one letter cue for “cognition” would be “C _ _ _ _ _ _ _ _”, while a four-letter-cue would be “C O G N _ _ _ _ _”). During the final, all the students simply wrote down as many of the words as they could remember. Here are the results:
The results are statistically significant: the fewer the letters in a cue, the better the score on the final test. Carpenter and DeLosh argue that these results support the notion that more elaborate retrieval processes during practice tests lead to better results on final tests. Practice tests need not duplicate the format of the final test. Instead, practice tests should require as much effort as possible from the test taker. If the goal is long-term retention, final tests should also be in a free-recall format rather than, say, choosing from a list of possible answers.
There are some limitations to this study. Each experiment was administered over a period of about an hour — the results might not hold over the long term. That said, the Glover study forming the basis for this new study was done over four days, so Carpenter and DeLosh can also be said to have expanded the basis for their conclusions.
These results also gel nicely with the anecdotes I’ve heard from teachers. And they make it clear that I’ll probably be Jim’s Spanish quizmaster for a long time to come.
Carpenter, S.K., DeLosh, E.L. (2006). Impoverished cue support enhances subsequent retention: Support for the elaborative retrieval explanation of the testing effect. Memory & Cognition, 34(2), 268-276.