How to Read a Scientific Paper

My course this term is on time and timekeeping, but is also intended as a general "research methods" class. This was conceived by people in the humanities, where the idea of generic research methods makes a lot more sense than in the sciences (where there's a lot more specialization by subfield), but I'm going to try to give as general an overview of how to approach scientific research as possible in a course with no prerequisites. The following is sort of a rough sketch of a lecture for next week, on how to approach the scientific literature, so comments and suggestions are welcome. This is intended to be in somewhat the same spirit as Timothy Burke's excellent How to Read in College. This will also be slanted toward the physics side of things, because that's the part of academic science I know best.

So, you find yourself in a situation where you need to read a scientific paper for some reason, and you want to know the most efficient way to do this. You could, of course, start at the beginning, and read straight through to the end, but that takes a long time, and might require reading a lot of irrelevant material. With a little basic knowledge of how scientific papers work, though, you can do this much more efficiently. The following steps are intended as a guide to make your reading more effective.

STEP ZERO: Know what you want. Before doing anything else, make sure you know what it is you hope to get out of this paper, because that will dramatically change how you read it. Are you looking for a specific number to plug into a calculation of your own? A sense of a broad research field? The details of a measurement technique? A way to poke holes in a result that disagrees with your pet theory? Those different types of information show up in different places, and that shapes and directs your reading. Make sure you know what sort of thing you're looking for.

STEP ONE: Know the structure. The base structure of a scientific paper is more or less the format we require for labs in our introductory classes: There's an Abstract, and Introduction, the Procedure, the Results, and the Conclusion. The problem is, about the only place you are guaranteed to find that structure clearly laid out and rigidly followed is in a lab report written for an introductory class. Real scientific papers, particularly those written for more prestigious journals (Science, Nature, Physical Review Letters) will often mix sections together, or repeat bits of the structure, several times through the course of the paper. Knowing the base structure gives you an idea where to look, and how to orient yourself when you dive into the middle of a text.

The Abstract of a scientific paper is a one-paragraph (usually) summary of the main points of the article. You can think of this as sort of like the "Attention conservation notice" Cosma Shalizi puts at the start of his longer posts: it's there so you know what to expect the article to contain, and reading it should tell you whether you want to read the paper or not.

The Introduction is generally toward the start of the paper, as you would expect, and exists to put the work in a larger context. This is where you'll find a discussion of the historical antecedents of the current research, and the easiest way to identify an Introduction section is by the density of citations. If you scan over a paragraph and see a ton of citations (footnotes, endnotes, parenthetical author-date-page citations, whatever), you're probably looking at an Introduction-type section.

The Procedure is the section where they explain what they did and how they did it. In some longer and more highly formatted papers, this will be clearly broken out into a single section, but it's very common for this type of section to appear multiple times in a single paper, once for each type of measurement or theoretical prediction made in the paper. The nominal idea of this section is to provide enough information that somebody reading it could reproduce what you did in enough detail to check the results. Most real Procedures fall short of this ideal, either by pushing details off into the references (Procedure sections are the second most citation-heavy sections, usually because they're using a citation to get out of repeating a lengthy explanation. Red-flag phrases for this are "Following the method of Ref. 19..." and "As explained in..."), or by omitting stuff entirely, whether unconsciously (there's a lot of tacit knowledge, especially in experimental science, that is so ingrained that it never occurs to the authors that the reader might not know it) or unethically.

The Results are, obviously, what they found when they did what they did. This is the other type of section that get mixed in with other stuff, generally the Procedure. Lots of papers will contain multiple related measurements, and it's not at all unusual for these to follow a format that mingles Procedure and Results, so you'll get something like Abstract-Introduction-Procedure1-Results1-Procedure2-Results2-... Results sections often include tables and graphs, so these are a dead giveaway when trying to identify the appropriate section.

The Conclusion is generally at the end, though you will sometimes get sort of sub-conclusions at the ends of individual measurement sections. Conclusion-type sections are where you discuss sources of error, the determination of uncertainties (which may include some mini procedure-and-results measurements), and the possible implications. Conclusion sections are the third area with lots of citations, often repeating citations from the Introduction, but sometimes bringing in entirely new papers that are supported or contradicted by the current results. This is often where you find proposals of new measurements, as a way of staking a claim to a field. You will often run into citations in the Introduction of a new paper that are referring back to a single sentence in the Conclusion of an older paper, whose author proposed something along the general lines of the new measurement.

STEP TWO: Know the types of paper. There are nearly as many styles of scientific writing as there are PI's generating papers, but you can crudely divide papers into a couple of categories. The most obvious division is between Theory and Experiment, but within those broad groups there are some different classes of papers.

One very important class of scientific paper is the Review Article. As the name suggests, this is a paper that reviews a field as a whole. It generally will not contain new results, but will summarize the important results of many other papers in a given field. A well-written Review Article is a fantastic way to get a sense of a new field: what the interesting issues are, who the major players are, etc. You can identify Review Articles by length (they're often 50-60 pages long), number of references (often running into the hundreds of citations of other articles), and structure (they tend to be more orderly than other types, and often include a kind of mini-table of contents at the start. The Abstract and/or Introduction will often explicitly identify the article as a review, as well. Some journals publish almost exclusively Review Articles: Reviews of Modern Physics is the obvious example, but anything with Advances or Comments in the title is a good bet for a review.

I've ResearchBlogged a few review articles, such as this one on the fin structure constant and this one on quantum information in Rydberg atoms.

Another important type is the Proof-of-Principle Measurement. This category of paper is identified by phrases like "We report the first..." or "novel technique for..." These are, as those phrases suggest, cool effects being demonstrated for the first time. They tend not to be great measurements in terms of precision-- 10% uncertainties are pretty common-- and often won't include much about uncertainty at all. These papers will often have a lot of detail about the measurement technique used.

A lot of the papers I've blogged about here are Proof-of-Principle Measurements, such as this one on atoms in optical lattices, or this one on weak measurements. These tend to be a lot of fun, as such things go: lots of gee-whiz, gosh-wow stuff, not a lot of gory statistical detail.

The other big category of new measurements is the Technical Advance, where somebody takes the technique from a Proof-of-Principle Measurement and refines it a little to make a better measurement of something. These two can be a little difficult to separate from each other, because they're generally not measuring exactly the same thing that was measured to prove the principle, but you can usually figure it out from the Introduction and Procedure-- a Technical Advance will have less procedural detail, and more citations of earlier stuff. Technical Advances also include more discussion of uncertainties and how to reduce them.

The final category of new measurements is actually a small fraction of published papers, but an important one: the Precision Measurement. In some sense, this is a subset of the Technical Advance, but it's one that measures exactly the same thing as a previous measurement, with an explicit goal of making the uncertainty as small as humanly possible. These can be identified by the length of the Conclusion section-- a real Precision Measurement paper will go on at great length about identifying and eliminating systematic errors, and comparing the results of their measurement to the results of the same measurement made by other groups.

The electron EDM measurement I ResearchBlogged last year is a good example of a Precision Measurement, and the FTL neutrino paper is in many ways more of a Precision Measurement than a Proof-of-Principle.

These category descriptions have a very experimental slant, because that's my background, but similar types seem to exist withing the theoretical subset of papers. That is, there are theory papers where somebody introduces a new method, and uses it to make a rather rough prediction of something, papers where somebody refines an existing technique to get improved agreement with experiment, and papers in which the point is to get the agreement nearly perfect.

The steps to this point are all preliminary background knowledge. Once you've got these ideas, you can start to consider approaches to a specific article.

STEP THREE: Read the Abstract. Most of the time, the Abstract will tell you what sort of paper you're dealing with. Combine that knowledge with your goal from Step Zero, and take the appropriate action.

If you're after a specific numerical value-- the latest number for a fundamental constant, or a property of a specific material-- the information you need may very well be in the Abstract itself. A paper that measures a specific quantity will generally give the measured value, with uncertainty, in the Abstract. You still need to look at the Results and Conclusions to find the caveats and uncertainties, but to get an input for a quick calculation, you can just take the number from the Abstract.

If you're looking to understand a general field, and the Abstract makes clear that what you have is a Review Article, then you're all set. Dive into the Introduction and start reading. If the Abstract tells you that this is a Technical Advance or a Precision Measurement, you need to look through the Introduction to try to find something that is more review-like, or even a Proof-of-Principle Measurement. Find the appropriate citations, and go back to the beginning.

If you need to understand a particular technique, say because you started with a Precision Measurement and followed a citation to the paper you're now looking at, you want the Abstract to indicate that this is either a Technical Advance or a Proof-of-Principle, at which point you look for the Procedure section. If the Procedure you've got doesn't give the information you need, look for references to earlier papers that might provide more detail.

STEP FOUR: A picture is worth a thousand wossname. If you need to delve into the guts of a paper (beyond the Abstract and Introduction), most modern scientific papers will include figures for most of the important steps. The Procedure will often contain a schematic of the apparatus or the physical situation being considered. The Results will usually include graphs, tables, or pretty pictures of the results. These generally have descriptive captions, which will often include sufficient information to explain the basic idea (in journals like Science and Nature with strict page limits, the captions are set in smaller type, so you can pack a lot of detail in there that would take up too much space in the paper proper). If the caption doesn't tell you enough, skim the text near the figure looking for paragraphs that refer to the figure in question, and read the description in the text. You can continue to work backwards as needed to get the information you need-- if the graph is plotting a quantity identified only as some squiggly Greek letter, look for the first equation in which that squiggle appears, and read the surrounding text for the definition.

(This is harder for older papers, when image reproduction technology wasn't as good-- some really old papers won't contain any figures at all, in which case you have no choice but to read the whole thing.)

Don't feel bad about skipping sections that don't matter to you. If all you're really after is the historical origin of some technique, you don't need anything past the Introduction and maybe the Procedure. If all you want is the best current value of something, you don't care about the Introduction. And so on. Get what you came to the paper for, and get on with what you're really interested in.

There's nothing wrong with reading whole articles start to finish, of course, and it can actually be fun to learn some of the details of Technical Advances. But the critical thing is to get the information that you need from the paper, and these steps give you a way to start doing that as efficiently as possible.

Categories

More like this

STEP FOUR: A picture is worth a thousand wossname.

What's a wossname?

By anonymous (not verified) on 10 Jan 2012 #permalink

It's a test to see if you've read any Discworld.

This seems good advice for students, and something that does not occur to many people.

I once wrote a piece about how to assess the worth of a paper, with examples of incoherent procedures, conclusions not justified by the data, wrong statistics, faulty referencing, etc (and some good examples, too). It was difficult to know where to send it, but it was accepted by a regional journal directed at my target audience (I was working in a third world country at the time). Unfortunately, it belatedly produced one more issue then faded out of sight before printing my paper. Perhaps I should find a suitable journal, update the article and resubmit it.

By Richard Simons (not verified) on 10 Jan 2012 #permalink

Don't forget to look for supplementary material if you are interested in the methods in detail. This is often provided in a separate file. The format of Nature, Science, and lots of other "newsy" journals requires the entire research to be summarized in around four pages. Yikes. It is near impossible to publish enough detail in four pages to make the research anything like reproducible. Supporting material is also a good place to find additional analyses that support the main conclusions of the article -- often those that were requested by reviewers!.

"STEP ZERO: Know what you want."
Your explanation seems to rule out honest curiosity. There is a difference between self-centric motives ("know your interests") and subject-centric issues ("know the context").

I've also noticed that the old writer's rule "know your audience" can be applied to the writers themselves. A quick start to finding out the writer's intended audience is to see the list of references. If you know the field, you can recognize the camps. This is not so important in hard sciences, but in humanities it is good to know which choir is being preached on.

By Lassi Hippeläinen (not verified) on 10 Jan 2012 #permalink

I would add this: If you find you have to read the whole paper, you may run into entire paragraphs that you don't understand. Don't worry; just plow ahead, and plan to come back to the tricky bits. Papers these days are really dense and often its necessary, even for pros, to read the paper iteratively. I don't mean read the paper over and over, but to come back to certain sections in the light of what you've learned further down. Most papers whose contents I actually understand had to be read in this way. Its unsettling for students to read paragaph 5 when they haven't understood paragraph 4, but its good to get used to this approach. You're not reading a novel or the newspaper, it's more like taking an exam or doing a crossword: just because you can't do question 4 doesn't mean you shouldn't try question 5.

Interestingly, yesterday was the start of classes at UIC, and the most useful class is likely to be, "Research methods in CS," or, essentially, "What you will wish your advisor had told you, five years from now, but we know he won't so we'll try to tell you now."

I applaud the existence of that class in so many ways: I'm a smart guy, with a breadth of experience that will probably intimidate my classmates, but to be frankly honest about it, I don't know dick about doing actual research. Having someone sit down and organize some advice, practices, exercises, etc is going to be highly useful.

By John Novak (not verified) on 11 Jan 2012 #permalink

I would add "Step negative one: don't be intimidated."

You don't have to know much of anything about the subject at hand to get something out of a paper -- after all, you're reading the paper to learn, right? When I shifted out of research into writing and education, I discovered that knowing how to read a paper in my former sub-field of physics meant that I knew how to read an archaeology paper, an ecology paper, a sociology paper, or even a medical paper (though the acronyms run rampant there).

The basic skill is widely applicable, and I'm glad folks are trying to teach this to undergrads now. I wish someone had taught me!

Good article.
Decide what you want before you read the paper.
Also there are good gauges at to how good a paper is, which are easy to work out if you have expertise in the field.
Does the author use big words to baffle with bullshXXt?
If so, throw up the alarm button. You probably have one of the suspects like graduate spinning rubbish to impress lay people and grants committees, or alternatively to present a hypothesis that has no real basis.
References cited is also often a good way to sort fact from fiction even before the paper is read - or at least an indicator what is likely. When you see an author omitting key papers and citing what is known to be rubbish, have the alarm button set. We see these indicators all the time in the biological sciences.
We do reptile education and wildlife shows see http://www.snakebusters.com.au in Melbourne Australia and at the end of each school incursion we leave behind a disk with hundreds of scientific papers by various authors for the kids to read. Ours are selected on the basis of relevance, ease to read and the like and are often the first such papers school-age people have read.
My view is that kids should be exposed to more of this stuff sooner so that they can critically look in depth at more things instead of the two second grabs on most webpages, which is what most grow up with these days.
All the best
Snakeman

@ Richard Simons

I would LOVE to get my hands on such an article.