The Best Way to Study: Practice Tests

By purepedantry on February 21, 2008.

I remember when I was studying for Step I of the medical Boards. Step I is the first of three very large tests that you have to take to become a doctor. This first test comprises everything you learn in the first two years of medical school, and it can in theory include the pathology and physiology of anything that can go wrong with the human body. Most people take at least 6 weeks of continuous time to study for it.

Sufficeth to say it is a lot to learn.

Numerous techniques are employed by medical students studying for the Boards. There are the readers who attempt to reread every one of the 50 or so textbooks you have to buy in medical school. There are the group studiers who talk about subject or practice questions in small groups. (I never found either of those strategies particularly productive.) However, by far the most common strategy in was the use of repeated practice tests. The majority of students purchase a huge database of practice questions -- at least 1,500 and probably more like 2,500 -- and use these to study. If you got a question wrong you would go look up why, but mostly it was to give you a sense of what was most likely to be on test.

I now find out that this was exactly what I should have done to study most productively. Karpicke and Roediger, publishing in the journal Science, show that the best way to study involves repeated retrieval in the form of testing. Hindsight is so sweet.

Karpicke and Roediger challenge the common assumption in education that repeated encoding of material through studying makes you more likely to retain it. Let me just define a few terms here. When you learn a new fact, you encode it into memory. Generally, after encoding there is a process of making that memory more permanent called consolidation. Finally, when you need that fact during the test, you retrieve it from memory. If you successfully retrieve it, you are said to have retained it.

People tend to assume that more encoding leads to better performance. If you read the book over and over again until you can recite it in your sleep, and you will do better on the test. The actual retrieval of the encoded memory is considered irrelevant to later performance.

Karpicke and Roediger summarize this common belief:

The standard assumption in nearly all research is that learning occurs while people study and encode material. Therefore, additional study should increase learning. Retrieving information on a test, however, is sometimes considered a relatively neutral event that measures the learning that occurred during study but does not by itself produce learning. Over the years, researchers have occasionally argued that learning can occur during testing. However, the assumptions that repeated studying promotes learning and that testing represents a neutral event that merely measures learning still permeate contemporary memory research as well as contemporary educational practice, where tests are also considered purely as assessments of knowledge. (Emphasis mine. Citations removed.)

To test this assumption, the authors teach student test subjects multiple foreign word pairings -- sort of like when you are trying to memorize your Spanish vocab -- over the course of a week using four different studying regimens. Here are the four groups:

ST -- In the control group, the students studied the word pairings and then were tested on them. Then they studied them again and were tested on them again. This was repeated over and over again. All the word pairings were maintained in each study period and the testing period throughout the week.
S_NT -- In the first experimental group, the students studied and were tested on all of the word pairings at the beginning. However, whenever they got an answer correct on the test that pairing was dropped from all further study periods. It was maintained in all further testing periods.
ST_N -- In the second experimental group, the opposite was done. In this group whenever a student got a word pairing correct that pairing was dropped from all further tests but not all further study periods.
S_NT_N -- In the third dropout group, when a word pairing was gotten correct during any test period that word pairing was removed from any subsequent study or testing sessions.

The ST_N group is meant to show the effects of repeated studying on performance, whereas the S_NT group is meant to show the effects of repeated testing on performance. All the groups were tested on all the word pairings at the end of the week to see how many they remembered.

Below shows the results (Figure 2 from the paper):

These results clearly show that the key aspect in high performance is repeated testing of the material, not repeated studying of the material. To rephrase that sentence in neuroscience speak, repeated retrieval of the memory improves performance, not repeated encoding.

However, the distressing part is that no one seems to have picked up on this. For example, the researchers asked the members of each group how well they thought they would do on the final exam. (They asked at the beginning of the week.) All of them said that they would get about half right.

Practice testing is not the way most people study. At least when I was in college, I would self-test to see if I knew it, but if I didn't I would just read the book some more. The testing aspect was not part of the studying; it was intended to make me remember. It was just to assess whether I needed to read more.

The authors summarize this concern:

Indeed, questionnaires asking students to report on the strategies they use to study for exams in education also indicate that practicing recall (or self-testing) is a seldom-used strategy. If students do test themselves while studying, they likely do it to assess what they have or have not learned, rather than to enhance their long-term retention by practicing retrieval. In fact, the conventional wisdom shared among students and educators is that if information can be recalled from memory, it has been learned and can be dropped from further practice, so students can focus their effort on other material. Research on students' use of self-testing as a learning strategy shows that students do tend to drop facts from further practice once they can recall them. However, the present research shows that the conventional wisdom existing in education and expressed in many study guides is wrong. Even after items can be recalled from memory, eliminating those items from repeated retrieval practice greatly reduces long-term retention. Repeated retrieval induced through testing (and not repeated encoding during additional study) produces large positive effects on long-term retention. (Emphasis mine.)

So, word to the wise, self-test throughout your studying. You will do much better on the final exam.

Karpicke, J.D., Roediger, H.L. (2008). The Critical Importance of Retrieval for Learning. Science, 319(5865), 966-968. DOI: 10.1126/science.1152408

More like this

I'm not suprised by the result, but I wonder if the type of knowledge they choose to assess inflated their results. In my experience, two things- organic chemistry structures/biochemical pathways and Chinese character writing- are completely impossible to learn without extreme amounts of retrival. Foreign word pairings seem like they would go along those lines.
That said, some other subjects (like comparative geography/philosophy/history) I learned best by thinking about the material as I was learning it- integrating it into a framework with other knowledge.
I think they are testing memorization as one form of learning, but I wonder if this would be as true of other forms.

Having done a bit of Chinese character writing myself, I am inclined to agree with you. I think that this demonstrates the best way to study for exams that require large amounts of memorization. It may also apply to things like problem set where the way to solve the problem is something you that you rehearse.

More conceptual exams probably require you to mull over the subject for longer. So I don't know if this research would generalize to them. Then again, if you think about it, rehearsing an answer for an essay test or forming a conceptual framework about the material probably requires repeated encoding and retrieval. This research suggests that in that process it is the retrieval which is most critical.

We wrote up a different study supporting this result a while back:

Here.

Although this information will certainly be useful to students slogging through school (and perhaps teachers struggling to ensure students pass NCLB high stakes tests), the study may encourage learning behaviors that are ultimately counter-productive.

What's important in the "real world" is transference: can a student apply what's learned in the classroom to day-to-day problems? Such problems aren't typically presented in the form of test questions - they're messier, and perhaps full of ambiguities. The comment on "conceptual" issues approaches this question, but it's broader than that. If we're teaching specific (and narrow) ways to recognize, organize, and retrieve information, we may be limiting the ability of students to actually use the information in other contexts (which is what building expertise is all about).

I find this to be true in my own experience. But unfortunately, not all tests have practice tests available.

This has been well understood by effective pedagogues since Socrates.

I think that a lot of teaching (and studying!) efforts are hampered by their invariably frustrating attempts to foster that mysterious force known as "knowledge", rather than simply shaping the sorts of behaviors from which that knowledge is typically inferred, at least to the extent that "knowing" something is simply being able to produce an appropriate response or answer under a variety of (often novel) situations. To this end, we might in the course of our studying silently arrange situations that would elicit a given sort of answer - "asking ourselves questions" - or we might have a sheet of paper do the same thing for us. The advantage to having a prepared sheet of practice questions is that the content can be determined by someone who is familiar with what material is most important. When reading a long passage to ourselves, we may not be fortunate enough to happen upon the right sorts of questions to arrange for ourselves, and so responses appropriate to those questions never get their chance, at least not until the moment of the real test at which point they may simply be weak and barely available.

What surprised me most here is that anyone could think that producing or retrieving an answer for a test question is neutral with regards to learning! No matter who it is arranged by, the student or the professor, a question sets an occasion upon which a given response can be reinforced and thus strengthened for future occasions. Throwing behavior out into a critical environment is the best way to prune out irrelevant or inappropriate (ie wrong) responses and tease out appropriate ones. We rarely repeat things we've been told are unambiguously wrong, yet we frequently repeat things that have a strong positive effect (even if we are merely repeating them to ourselves that night).

i know this post is old, but any answer to my question would be great.

tests may increase retention, but is testing sustainable in the long-term? is there a motivational factor that would prevent someone from practicing tests everyday?

i've studied expert practice and the practice has to maintain motivation. competitive practice for example, is completely unsustainable. whether tests fall under the category of competitive practice, i don't know.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

Communism V. Journalists: Beijing’s Crackdown on Press Freedom

More by this author

Best. Modelling Paper. Ever.

August 17, 2009

The abstract says it all: Zombies are a popular figure in pop culture/entertainment and they are usually portrayed as being brought about through an outbreak or epidemic. Consequently, we model a zombie attack, using biological assumptions based on popular zombie movies. We introduce a basic model…

Journal Editor Speaks about His Experiences

August 10, 2009

(I had this whole post ready talking about flexible representations, but now my computer is borked -- stupid monitor! -- so this is going to have to do.) Tyler Cowen over at Marginal Revolution links to a piece by a former editor at American Economic Review telling all about how papers are accepted…

Obesity is not a myth

July 30, 2009

There is a great conversation going on at Megan McArdle's blog with Paul Campos, author of The Obesity Myth. I say great because it give me the opportunity to show how astonishingly wrong Campos in suggesting that the obesity at the lower end of the BMI spectrum -- not just morbid obesity -- is…

Imaging a Superior Mnemonist

July 15, 2009

In neuroscience, we spend most of our time trying to understand the function of the "normal" brain -- whatever that means -- hence, we are most interested in the average. Under most occasions when scientists take an interest in the abnormal neurology, it is usually someone with who has something…

Key paper in depression genetics disputed

June 24, 2009

I wanted to draw attention to a new paper in JAMA recently because it reveals a lot about how conditional most of the statements we make in behavioral genetics are. Every time you hear a news article that says, "Gene for depression found," I want you to think about this case. Risch et al.…