Finding scientific papers for free, part II: comparing methods

tags: , , , ,

This is the second part in a three part series on finding free scientific papers. You can read the first part here: Part I: A day in the life of an English physician

Today, we do an experiment with PubMed and PubMed Central to determine the best way to search for free articles.

The biggest problem that our doctor friend, from part I, faced, wasn't that he couldn't find the information he wanted. His problem was that he found too much information. And, most of what he found, he couldn't get at. He wasn't happy about following several links only to find that he couldn't read the articles.

I thought that he might be happier and certainly save some time by limiting his search to the articles that are free and skip the others. The problem is that there are a few different ways to do this. So, I decided to compare them and determine which method would give the largest number of free articles, and the most recent publication dates.

Here is the method to my madness: I searched either PubMed or PubMed Central through the National Center for Biotechnology Information (NCBI) with the term "cancer." Then, I either limited the search (using Limits, shown as "PubMed Limits" in my graphs below) or I used the Display menu to filter the results (shown as "PubMed PMC" links in the table). Last, I sorted the results by date to see which search method gave the newest publications.

The results:
It's not surprising that our physician friend from the National Health Service was frustrated when he searched PubMed and then tried to look at the articles. In my experiment, only 11 percent of the articles that I found were freely available, and that was when I used the best method.

i-d42c13bad60324338916d327ab09b7c2-fraction.png



i-44638db4e8dc3f977d25669e37854d93-graph2.png





So what does this all mean?

The first column shows all the cancer citations that I found with PubMed. The next three columns show the results from three methods I use for finding free articles. The least successful method was to use the PMC links filter. When I took my PubMed results and filtered by PMC (PubMed Central) Links, I only found a fraction of the articles that were really available, and the most recent article was from April 28, 2007. This is because filtering is limited to the first 10,000 items for this database.

If I searched PubMed Central directly, I obtained more articles, but the most recent articles were from May. My PubMed search, on the other hand, had located articles, from June. (Yes, June hasn't happened yet, but some articles can be found before they're officially published.).

Overall, I found the best method, with the most results, was to use PubMed and limit the articles to those with free full text. While it's certainly true that 219,985 articles are way too many for me to read, at least I know I'm getting the most recent articles, and that I will be able to view the articles that I choose to investigate.

Plus, one of the tabs limits the results to review articles. (Review articles, for those who don't already know this, summarize the results of other articles.) With 14,819 reviews, I can limit the search criteria even more, have a much better chance of finding what I want and I don't have to waste my time trying to look at articles that I can't access.

Read the whole series:

  • part I A day in the life of an English physician,
  • part II Comparing different methods,
  • part III My new favorite method,
  • part IV One last experiment

Copyright Geospiza, Inc.

More like this

Good topic! I retired from a big pharma company a few years ago and started a little consulting consulting business and have also continued to do a lot of writing (reviews in peer reviewed journals). Getting access to free articles online is a big problem. I can often appeal to my clients or the journal editors (for whom I am writing) to get the articles I need - but, trying to stay abreast of my field, I am often frustrated in my searching. I find that Google Scholar is pretty useful - in that they include most of the available copies of the articles online - and sometimes that includes actual copies which may appear on author's websites. Since, in writing reviews, I can't limit myself to freely accessible articles but must cover "everything" I find that requesting reprints directly from authors is useful - and often a good way to reconnect with old (make that erstwhile) colleagues. Everyone I've asked has been very generous and pleasant (as I try to be when I get such requests). But one can't do this for every interesting article. I am looking forward to your next installment of this seres - your favorite method.

Thanks Lynn!

I've focused on PubMed and PubMed Central in these posts, but your ideas are wonderful for finding specific articles and locating articles from alternative sources.

How about it readers, anyone else want to share their suggestions?

There are two methods I know that both have rather better success rate than this, and both are methods I've frequently spotted in the wild.

The first one: search in Google (or, preferentially, Google Scholar) for the authors' homepages. More often than not, papers will be freely downloadable from there. And if it isn't - and you really need the paper - you can email the author who almost always will be more than happy to send a PDF. And if the first author doesn't have the paper, try the second author, third...

Second one: leech off your university account. Very often student accounts are not terminated for many years after graduation, and this goes especially for graduate students. Log in through your account and use the online resources that your university has. And if your university doesn't have a subscription to a particular journal, one of your colleagues' is sure to have it. I've seen people long out of grad school (like 15 years out) still use their old university resources without problems. This one is not strictly allowed, of course, but so popular that any account not including it would be deficient.

This is a great series! Having left the lab, I no longer have all the wonderful access that I use to have and run into this problem all the time. I look forward to the rest of the series (especially the secret favorite method!).

Thanks Janne,

I definitely have different strategies if I'm looking for a specific paper vs. just looking at most recent pubs or trying to see what's been done in a field.

The suggestion about looking for authors sites is a good one. I do that, too, and I've put PDFs of some of my papers on our website at Geospiza (after getting permission from the publisher, of course).

Sometimes, I've also found papers on-line that have been posted by instructors as assignments.

Most journals I've looked at actually specifically allow you to make either the paper itself or the "preprint analogue" (ie. same content as printed but not the actual final PDF generated by the journal) available on your personal web pages. And if a journal that disallow it would really start to crack down on people offering their own papers on their site or make available via email, that is a journal that could quickly find itself without contributors.

For looking at what's done in a specific subfield, the reality is that any one specific paper is only rarely critical. Whatever result or development you're looking for, chances are there's several papers - review papers as well as papers expanding or clarifying the original results - that could be used for reference or learning. And for any significant results, in practice the authors themselves will have several papers (journal paper, a conference proceeding, a review paper of their own) dealing with it. And in those rare cases when a paper truly is singular, it's likely so well known and widespread you need look no further than your colleague one desk over to get a copy.

I just wish that more authors would stick their publications up on their lab website, and to hell with the journals policy on this. I've always done this with my papers and I don't actually think there's ever really going to be a crackdown on this from publishers.

I have actually recovered an entire published book (that was a compendium of papers) from the authors website, and it saved me $160 in the process.

This online PubMed tutorial appeared in one of my RSS feeds at the same time as a number of posts pointing to this series. I thought it might be of interest/use to readers here.

Dogged persistence gets you about 90% of what you want. As stated above Google scholar gets you quite a bit, particularly if you use the "cited by", "All # versions", "Related articles", "web search" options to the hilt. Sometimes Google scholar gets you the title but not the pdf, but then searching regular google, yahoo, or clusty.com does. Keep going back to google scolar as you find titles and authors with other search engines.

Janne's right about that one paper just out of reach, you can usually get the same information published in a previous paper, conference proceedings or by a coauthor, but it's damn frustrating.

www.scirus.com/ is good. Do the same thing there: bark up EVERY tree.

Look up the author's web page, his/her department's pages, grad student's pages and papers and the papers, web pages of co-authors. Search last name, initials / initials last name / full name / last name only.

Try as many search keywords and keyword combinations as you can think of, and pick up more by looking at the abstracts and keywords from related articles.

The The Smithsonian/NASA Astrophysics Data System at: http://adsabs.harvard.edu/index.html has a surprising amount of biology titles.

Go to relevant journals, they almost always let you have the abstracts and often they'll have a few articles for free.

Your local public library will be hooked up to WorldCat and probably a few other useful sources. They will also probably use interlibrary loan.

If you can get to ISI or SCOPUS through work or a university library, you can cover a lot of ground in a short time.

If you have access to a subscription to an abstracting service such as Silver Platter or Cambridge Scientific Abstracts, exhaust them. Also if you can set up a Z39.50 client connection with bibliographic software such as EndNote or ProCite, do so - you can search faster and organize the results easier.

Use bibliographic software while you search, not later. Use a naming convention for the pdfs that you download, before you lose track. I usually use "principle author_secondary author space yearyear" if there's overlap, you can add a, b, c to the title.

KEEP AT IT. Worry the search like a dog worries a bone.