Bio Databases 2015

By finchtalk on January 30, 2015.

Something interesting happened in 2014. The total number of databases that Nucleic Acids Research (NAR) tracks dropped by three databases!

What happened? Did people quit making databases? No. This year, the "dead" databases (links no longer valid) outnumber the new ones. To celebrate Digital World Biology's release of Molecule World I'll discuss some of the new structure databases below. But first, the numbers.

number of dbs 2015 As summarized in the database issue's introduction, Galperin, Rigden, and Fernández-Suárez tell us this year's issue has 172 papers. 56 of those describe new databases, 98 provide updates, and 17 are updates of databases that have been published elsewhere. Together the 56+17+1 other make 74 new entries in the NAR online Molecular Biology Database Collection. Removing 77 obsolete databases made this year's growth -3.

The despair of riches

The introduction paints an exciting picture of database development. We have updates of existing resources and new resources that can be used to advance multiple kinds of research. I share this view that new knowledge is created from new kinds of database and extensions of existing databases and am always excited to peruse the NAR database issue. But you do not need to peel the onion very far before you begin to cry.

The challenges any user, other than a virtuoso data-miner of the resource at hand, encounters when trying to quickly assess the value of a resource is finding whether it is alive, is classified correctly, and lets you do things with the data other than browse information within the database's web site. A common tool to help someone evaluate a database's usefulness is an example of the data and a simple demonstration showing how the resource can be used. A short description of the resource's value on the web site also helps.

Structure of Nosiheptide bound the large ribosomal subunit

Structure PDB: 2ZJP - Thiopeptide antibiotic Nosiheptide bound to the large ribosomal subunit. The RNA and protein residues in the 23S rRNA and protein L11 are highlighted by residue coloring and ball & stick rendering. Nosiheptide is highlighted with element coloring and space fill rendering. The L11 backbone is shown in magenta and the structurally relevant portion of the 23 rRNA is shown with in element coloring. This image was created in Molecule World.

As noted above, this year I wanted to find some data from a database, other that the RCSB Protein Databank (aka PDB) or the NCBI's MMDB (Molecular Modeling DataBase), that I could download and visualize in Molecule World. I excluded PDB and MMDB because I wanted to try something new.

What did I learn?

I followed the link from the introduction to the NAR Molecular Biology Database collection. In this collection, databases can be accessed alphabetically, by category, or by other mechanisms. This is a browse and click experience. Unlike the databases it collects, the issue doesn't allow you to search the collection. Since I wanted to get some structures and look at them in Molecule World, I started with the Structure Database collection.

Indeed, there are many databases in this collection. Structure databases are categorized as Small Molecule, Carbohydrate, Nucleic Acid Structure, and Protein. At the top, there is the Bard (Bioassay reference database from the NIH Molecular Libraries program) database. It contains structures for 39MM chemicals (perhaps a subset of the 50MM chemicals in PubChem?). The Small Molecule, Carbohydrate, Nucleic Acid, and Protein groups hold 24, 12, 22, and 116 databases, respectively. All together that's 175 databases or greater than 10% of the entire collection.

Let's go digging and see if we can find something cool.

The search for interesting structures led me to structures with nucleic acids, proteins, and complexes. One of the databases that caught my eye was SCOR (structural classification of RNA). Unfortunately, the URL -http://scor.lbl.gov - takes you to a page you're not allowed to access. Maybe it's secret work that's been published for people to not use? How did the reviewers access this?

Another possibly cool one would be Quadbase (G-quadraplex motifs in promoters) - it's URL (http://quadbase.igib.res.in) would not load a page. So far I'm 0 out of 2 just by clicking titles that look neat. Next, I tried NICR (Non-canonical interactions in RNA, http://prion.bchs.uh.edu/bp_type/). This isn't non-canonical, it's a loop. Every link takes you back to the original page with no obvious way to get to the database. We're 0 for 3. Did we strike out? Finding broken database links was NOT our goal.

Structure PDB:1K0Z. Restriction enzyme PvuII showing that it is a dimer of two identical protein chains. The protein backbones are shown with rainbow (amino to carboxyl, red to blue) coloring. The Pr atoms are shown as balls. Within each chain, the interacting residues are colored by element.

MetalPDB (http://metalweb.cerm.unifi.it), categorized under Nucleic Acids, is as the title suggests, a database of metal-binding sites in biological macromolecules. Although we found this listed in the Nucleic Acids category, the structures are mostly proteins.

A great feature in this database is a very cool search tool. It's a periodic table with a radio button under each metal. Metals without corresponding PDB structures have with white symbols.

The white chemicals provide a quick way to tell with metals are found in in structures and which are not (29 out of 84). Our own data suggest this is a little on the low side, but that's another story.

I selected Pr (prasodymium) because, why not. 32 structures were returned, with the first in the list PDB:1K0z. This protein is the restriction enzyme PvuII from Proteus vulgaris.

But here's where the experience moves from great to just OK. You can only work with structures within the website. Downloads? Make a small collection for a class? Sorry. You can't get there from here. Luckily, if I have a PDB ID, I know how to use it. With a quick search inside of Molecule World, I can get a structure from either the MMDB or PDB databases and make a fun picture. Ok, this database issue gets 1 point for mission accomplished (I found data, yeah!), -0.1 for being misclassifying under nucleic acids, and -0.3 for not making the data easy to pull out. They get 0.6. That makes our cumulative score for today's adventure 0.6 out of 4.

What's in the group of 116 databases under proteins? A lot of specialized things. My first try was 3D-Genomics. The link http://www.sbg.bio.ic.ac.uk/~3dgenomics/, returns, "The 3DGenomics server is no longer available." Again, seriously? It was then that I looked closely at the title of the NAR Molecular Biology Database Collection (category list page), is says "2014 NAR Database Summary Paper Category List." Did I get lost in the bowels of the NAR webpages? Going back to the top and rechecking the first link takes you to the top of to collection where this title says "2014 NAR Database Summary Paper Alphabetic List." That's right I got to the 2014 list from the 2015 introduction.

Maybe it was random, bad luck chance that led me to stumble on 4 of the 77 obsolete databases in my first 5 attempts to find cool things other people are doing. Or maybe you really can't judge a book by its cover.

Further Reading

http://finchtalk.blogspot.com/2011/01/databases-of-databases.html

http://finchtalk.blogspot.com/2012/01/bio-databases-2012.html

http://finchtalk.blogspot.com/2013/01/bio-databases-2013.html

http://scienceblogs.com/digitalbio/2014/01/09/bio-databases-2014

More like this

I love it

Thank you very much for your comments. While checking the database list in Nov. 2014, we've noticed that SCOR was dead but its authors promised to bring it back to life in the near future, so it was left in the list. We were not aware of the problems with the other three databases. We'll contact their authors before to make sure they are no longer maintained. The problem is that these database have been last featured in the 2002, 2004 and 2008 NAR Database Issues. We ask the authors to promise maintaining their database for 5 years after the initial publication, after that it is largely up to them.
Please send your comments and reports of obsolete databases in the NAR database list to xose.m.fernandez@gmail.com and nardatabase@gmail.com

Thanks for the update and clarification, Michael. I will send a note to the emails as you suggest. I always enjoy the database issue and seeing what people are doing. As noted, my investigation was not systematic, I picked the DB's by their names and my interest for this year's post.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

What is Biotech?

September 29, 2017

The biotechnology (biotech) industry is incredibly diverse. Recently, I wrote about the size of the biotech industry, which is, of course, related to how biotechnology is defined. As a strict definition, biotechnology is the use of biology to turn raw materials into useful products. However,…

How Big is Biotech?

August 16, 2017

A simple web search says biotech is really big. One estimate indicates that the industry will have $400 billion in sales in 2017 with growth to over $775 billion by 2024 [1]. Another report suggests there are over 77,000 employers [2]. That’s big, but is it real, and what you can do with this…

BioDatabases 2017 - What's out there?

January 12, 2017

It's time for the annual blog about the annual Nucleic Acids Research (NAR) database issue. This is the 24th database issue for NAR and the seventh blog for @finchtalk. Like most years I have no idea what I'm going to write about until I start reading the new issue. Something always inspires me.…

Teach Biology? We want to learn about your use of computers in the classroom

April 13, 2016

Computers, biological data (molecular sequences, structures, and other data), websites, and databases are integral to modern research. Innovations like precision, or personalized medicine, expect a certain level of patient participation, and our future food and environmental sustainability…

Bio Databases 2016

February 16, 2016

Someone missed the memo. Over the past year, news and presentations by NIH leaders like Philip Bourne have communicated that the proliferation biologically focused databases is unsustainable. However, unlike last year, where the number of databases tracked by Nucleic Acids Research (NAR)…

More like this

What is Biotech?

How Big is Biotech?

BioDatabases 2017 - What's out there?

Teach Biology? We want to learn about your use of computers in the classroom

Bio Databases 2016

Meet Jeff Potter: The Computer Software Engineer With Culinary Science Smarts!

The New OPERA faster-than-light Neutrino Test: Results!

344-352/366: Wetland Walk