Using the fact that sometimes scientists look at the pictures first

By cpikas on April 16, 2010.

I was happy to see that the authors published this article in PlosOne. I was following their work a while ago, but had lost track (plus, when asked, the last author implied that they had moved on to new projects). So here's the citation and then I'll summarize and comment.

Divoli, A., Wooldridge, M., & Hearst, M. (2010). Full Text and Figure Display Improves Bioscience Literature Search PLoS ONE, 5 (4) DOI: 10.1371/journal.pone.0009619

The authors created a prototype information system that used Lucene to index the metadata for open access biomed articles, the full text, and the captions for images and tables. The interface is set up to allow you to use one search box and then radio buttons to select full text and abstracts, figure captions, or tables. In the first, the results are sort of like the standard metadata and abstract with key word in context excerpts and extracted images. For figure captions, you can either have a grid of figures, or a list. For tables, you get a citation, the table caption, and the table. The article spends a good deal of time discussing design decisions, providing a tutorial for creating your own.

To build the prototype, they got the XML from PubMed Central, pulled out authors, images, captions, abstracts... They made different sizes of the images for quick retrieval later.They then included different fields with different weights depending on what you select to search. They then got a group of biologists (n=20 although number isn't really important for qualitative studies), and ran them through a study. The participants provided the query and looked at it in each view, thinking aloud about their reactions and steps. They were then asked a few questions about each interface

The majority of the participants would choose to use this type of interface for at least some of their searching. Seems like they got the full text search, but were not quite as sure about the table search. Some thought it would be useful for getting right to the results but several didn't think they would use it.

Now for some commentary...

I was somewhat critical in my post I linked to above, but I really think this is promising stuff. The authors point out that this is very dependent on access to the full text and also won't be universally useful. There are plenty of search situations in which the images wouldn't be used, but they should be an option. Since my earlier post, CSA has added "deep indexing" to more of their files. It's not the same as their dedicated Illustrata product, which is more like Biotext.

Publishers have the full text, so some of them are also making the images and tables available outside of the article. For example, both ACS and RSC have added images to their RSS feeds. ScienceDirect has a tables and images tab on their articles - which is nice for scanning to see if the article is relevant. PlosOne lets you look through a list of the tables and images, download a ppt or high quality image.

Springer Images also lets you search the tables and captions to get pictures. It also indexes the context of the reference to the image in the text. You also get a link to the article and excerpts like on Google Books. My colleague at work pointed out that it is useful for finding phase diagrams.

But more than all of that, there's been a lot of talk recently about disaggregating the journal article or even doing away with the whole and just using the pieces. If so, maybe this is an intermediate step.

More like this

Christina, do you know how much of this figure/table information can also be found via Google search?

Good question. Certainly Google could index all of the content Biotext covers (as can anyone), but precisely what Google indexes and when is not provided. This is a sore point with librarians. Clearly CSA and Springer index things that are covered with a standard copyright, but both provide data to Google. One would assume that they don't "give away the store" to Google, but I don't know what that boundary is. I also do not know how the Google team weights metadata from various parts of an article. Do they take the XML feed from PMC and treat the abstract differently from the full text?

A lot of what this team (and the Tenopir & Sandusky team) did was to understand *how* to make the images and tables available. It would be interesting to compare the google image search with the biotext image search - for a search that was very clearly within the biomed domain. The authors also point out that snapshots of the whole page in search results typically aren't that useful - does Google pull out images in scholar? I haven't seen it if they do.

Good question. Certainly Google could index all of the content Biotext covers (as can anyone), but precisely what Google indexes and when is not provided.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Yeah, me too.

August 2, 2010

I'm also leaving ScienceBlogs, but it's not for the reasons some others have given. I don't think Pepsi's blog will hurt my real life reputation and besides, it's been pulled, there have been apologies - it's time to forgive. July was the first month I've gotten enough hits to get a paycheck - and…

Very cool - American Physical Society offers free access to public libraries

July 29, 2010

This APS rocks! Here's the press release from PAMnet: FOR IMMEDIATE RELEASE APS ONLINE JOURNALS AVAILABLE FREE IN U.S. PUBLIC LIBRARIES Ridge, NY, 28 July 2010: The American Physical Society (APS) announces a new public access initiative that will give readers and researchers in public libraries…

Michael Pater, Connecticut artist, died today

July 25, 2010

He was also my husband's uncle. I only found two of his images online, the remainder are photographs of prints we have on our walls - intentionally poor quality for those. He was a member of the Lyme Art Association, so there may be more information on their site. The Courant (Hartford, CT) had…

Hey maybe scientists should do more than just wait for their journal to issue a press release on their new fabu article

July 25, 2010

The authors thesis is that the only mandatory communication of results is in peer reviewed journal articles. Scientists aren't required to do other communicating and often leave communication to the public to the media. They ask if is this is adequate given the very low percentage of scientific…

Well, sometimes you just have to Google it

July 21, 2010

So there I was, try all kinds of librarian ninja tricks on the fanciest, most expensive research databases money can buy (SciFinder, Reaxys, Inspec...) and no joy. Couldn't find what I needed. I'm perfectly willing to admit that I don't know all that much chemistry, but usually I do ok since I work…