Bio::Blogs 4

By sporte on October 1, 2006.

Welcome to the fourth edition of Bio::Blogs!

This is the carnival where we explore topics at the intersection of computing, biology, and sometimes a bit human behavior.

In this edition, we consider issues with annotation, agonize over standards, explore the question of whether or not it's possible to tame those wild and wooly computational biologists and make them laugh their way into writing programs that other people can use, give the Perl fans something to do while waiting for that program to run, and much, much, more.

Today, we'll begin on the biology side of spectrum and work our way over into computing.

Do try this in the classroom, or better yet, at home!
Chris Cotsapas submitted a truly excellent post from Nick Matzke at the Panda's Thumb that describes a wonderful activity and data set that students can use to look human evolution by graphing the differences between fossils. In Fun with Hominin Cranial Capacity Datasets (Excel), Matzke talks about hunting through the wilds of PubMed to uncover the elusive records of brain size in fossils. I've pasted a smaller version of his graph here, since Matzke gives permission for educational use (you are learning something, right?), but do check out the original graph - or better yet, graph it yourself, since Matzke gives a link to the original data sets.

Pedro Beltrao, from Public Rambling, and one of the fathers of this carnival, shares some interesting ideas about the evolution of transcription factors and the DNA sites where they bind in Evolution of transcription networks. As Mary-Claire King and Allan Wilson proposed, long ago, a little change in regulation can go a long, long way. Pedro observes, in looking at the data, that changes in a single base, in a site where transcription factors bind, are far more common than changes in the transcription factors themselves. This makes sense, since if you changed the specificity of a transcription factor (the protein), you would have more of a global effect (since the same transcription factor binding site is dispersed throughout the genome). Changing a single base in a single copy of a transcription factor binding site, located near a single-gene, would be predicted to have a more subtle, and probably less detrimental, effect on phenotype, so I think we would expect to changes in individual binding sites occur more often.

As long as we're considering DNA sequences, let's take a look at Neil Saunders' issues with databases and genome annotations. In an entertaining and thought-provoking article, Genome annotation: who's responsible?, Neil struggles through the Sargasso Sea environmental sequence data and other sections of GenBank in search of DNA sequences for 23S ribosomal RNAs and a protein sequence for monomethylamine methyltransferase. These journeys lead to a lament about the lack of community-wide standards in genome annotation, and some suggestions for improvements. To me, they also emphasize the importance of being able to retrieve and work with the sequence data itself. I don't think the semantic web is going to solve Neil's sort of problem.

Neil's analysis reminds me of an instructive paper by Micheal Galperin and Eugene Koonin, that's worth reading, even if it's almost ten years old, on systematic errors in genome annotation. I heard Galperin give a really funny talk on genome annotation bloopers. He talked about the curious puzzles in biology that can arise from gene annotation. In one case, a gene name became truncated and changed from a "phage head protein" to a "head protein." Now, when biologists think of the word "head," we envision the anatomical structure that contains a brain and sits on top of an animal body, certainly not a brainless virus.

You're right Neil, we do have a long way to go.

Chris Cotsapas, from Fourth Floor Studio, adds to the call for annotation standards in his post Phenotype: the new standards war?. Although I would disagree with Chris's comment that the HapMap project has collected the vast majority of common genetic variations (I think they've only gotten 1%, but I have to check on this), I do agree that developing standards for describing phenotypes is hard and sometimes contentious. Standards will certainly benefit the community, but getting there will involve many arguments about minutia and will no doubt be fraught with pain.

Since we've been discussing human behavior and it's impact on the problem on annotation standards, it seems like a good time to look at infrastructure. Let's face it, IT is expensive. Between 1985 and 2002, UPS spent over 17 $billion on information technology. They continue to spend over $1 billion per year, or 11% of their budget, so you can track your package. I suspect that if we were to add up all the person hours that biologists (like Neil and I) spend sitting at our computers searching for information, we'd find that scientific endeavors spend much, much more (in terms of time) and probably get much, much less.

In Scientific Software and in her paper in PLOS, "Scientific Software Development is not an Oxymoron," smeutaw writes about minimizing the pain by luring computational biologists into adopting ideas from software engineering, and maybe even taking classes in Software Carpentry. This blog post hit home since I work at a company that creates scientific software, so we encounter these issues on a daily basis. smeutaw does a nice job of describing some of the reasons behind Neil's frustrations, i.e. the lack of funding for activities like upgrading infrastructure, porting code to new systems, or keeping up with security concerns.

And speaking of infrastructure, we have a post from mndoci, on Utility computing, web applications and computational science. In this post, he discusses a product called AppLogic, which may help SaaS programmers and help "lower the barrier to scientific developers." Hope you're getting settled in and enjoying Capitol Hill, D.S.!

Last, we have one more treat from Neil Saunders, this time it's for Perl developers. Have you ever started running a program only to wonder if there's time to get a cup of coffee before it's completed? Wouldn't it be helpful to have a little program that would calculate the size of the computing problem and give you an estimate of the time that it needs? In this post, Neil shares a perl script for making a progress bar, Term::ProgressBar that will certainly help you watch as time goes by.

Until next time, "Here's looking at you, kid!"

Next month's Bio:Blogs will be hosted by Chris Cotsapas at the Fourth Floor Studio.

technorati tags: Bio::blogs, digital biology, bioinformatics, brain size,
genome annotation,
transcription factors

More like this

Awesome part-time opportunity for cell/microbiologists

For all my microbiology/cell biology peeps, this could be a neat opportunity. ASCB has obtained a two-year stimulus grant from NIH to assemble an image library of the cell.

New and Exciting in PLoS Biology and PLoS Medicine

A Gene Wiki for Community Annotation of Gene Function:

Using a "distributed grid of undergraduate students" to annotate genomes

I just love this title! It's nerdy and cute, all at the same time.

ScienceOnline09 - tapping into the hive-mind

Although I would disagree with Chris's comment that the HapMap project has collected the vast majority of common genetic variations (I think they've only gotten 1%, but I have to check on this)

it's certainly more than 1% (in phase II), but it's difficult to know, especially for regions of the genome without deep resequencing. in this paper, 5 out of 22 common SNPs found in resequencing were in the hapmap. how representative that figure is would be tough to predict.

Could you be thinking of the Encode project for the 1% figure?

I agree that it's hard to know; I'd also point out that recent efforts showing how common structural variations (insertion/deletion/duplication events) are redefining (again!) the way we think of variation. However, it's probably a good bet that many common (ie >5% frequency) SNP variants have been identified: ~3-4M in the HapMap, depending on the filtering criteria. It remains to be seen whether the assumption that SNPs make up the bulk of polymorphisms is true.

The main limitation, of course, is that the number of samples used is small (270 total), and drawn from three geographic locations, which may lead to ascertainment bias.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

New home for Discovering Biology in a Digital World

October 30, 2017

Sometime in the next day or two, Scienceblogs will shut down. We've enjoyed the opportunity to blog here for the past 10+ years. Not to worry, @digitalbio and @finchtalk will continue blogging, but more so from their own site at Digital World Biology. The Scienceblogs posts have been…

Synbiobeta: The Future is Now

October 12, 2017

@synbiobeta concluded it’s #sbbsf17 annual meeting on synthetic biology Oct 5, 2017. The progress companies are making in harnessing biology as a platform for manufacturing and problem solving is world changing. Locations of Synbio Companies What is Synthetic Biology? Synthetic biology is a term…

Understanding the CRISPR Cas9 system

September 18, 2016

On Sept. 30th, I'm going to be co-presenting a Bio-Link webinar on Genome Engineering with CRISPR-Cas9 with Dr. Thomas Tubon from Madison College. If you're interested, Register here. Since my part will be to help our audience understand the basics of this system, I prepared a…

Zika virus, drug discovery, and student projects

March 8, 2016

It's well understood in science education that students are more engaged when they work on problems that matter. Right now, Zika virus matters. Zika is a very scary problem that matters a great deal to anyone who might want to start a family and greatly concerns my students. I…

DNA: it's in your blood

February 28, 2016

Did you know small fragments of DNA are circulating in your blood stream? These short pieces of DNA are left behind after cells self-destruct. This self-destruction, or apoptosis, is a normal process. In the case of fetal development, certain cells in our hands die, leaving behind individual…