Genomics and DNA Barcoding

Identifying and cataloging biological diversity is challenging. One way to do go about IDing all the life forms is to sequence a known region of the genome in all those species. This is known as DNA barcoding. An article in PNAS reports on the DNA sequence of a gene found useful for DNA barcoding in plants. In a review of the paper, the following table is presented:

DNA Barcoding Genomics
Number of species All (or most) One (or few)
Number of gene regions One (or few) All (or most)

The gist: DNA barcoding results in the sequencing of a single gene in a bunch of species, while genome sequencing gives us the sequence of an entire genome in a single species. This may be true now, but for how long? The dropping price of sequencing will allow us to get information from many genomic regions in many species. These won't be high quality whole genome sequences, but the age of doing DNA barcoding with a single gene won't last for long.

More like this

What venter's doing is a kind of DNA barcoding, I think. But, rather than sequencing a specific gene from all the species, he's just sequencing random parts of their genomes. Also, he's doing it without first identifying the organisms he's sample. That's not a bad thing, but I'm not sure if it makes it under the small tent of DNA barcoding; perhaps it's only big-tent barcoding.

Good point. I definitely see the logic, and agree to an extent, but it may put practicing scientists in a bit of a bind. I run into this with my own work in squid systematics (where I use the standard animal barcoding gene -- cytochrome oxidase subunit I -- quite regularly). I could use my pathetic PCR-based approach to sequence a few mitochondrial genes and a few nuclear genes for a bunch of squid right now (well, if I can get some funding for it)...or I can just chill for fifteen years until I have a Genome-o-Matic down the hall to pump out squid genomes and a quantum computer on my desk to quickly analyze all those data. Of course, if my colleagues and I just fart around for fifteen years, we'll never get new computers or a Genome-o-Matic because we will be (rightfully) seen by the scientific community as a bunch of unproductive hosers.

The logic could extend to lots of different technology-dependent areas of science. Why send probes to Europa now to look for life when we can wait twenty years and have faster engines that will get us there cheaper in half the time?

I guess the answer is that if we start barcoding (or going to Europa, or PCR-ing squid genes) now -- and note I am ignoring the vigorous debate about whether or not DNA barcoding is even a good idea to start with -- we can get a good enough set of answers to some questions without sequencing full genomes or waiting for warp drive. I can probably nail down species relationships quite robustly in my group of cephalopods with just a few kb of genetic data that I could generate now, even though it will sure be nice to have complete genomes from representatives of all the species later. We can ask questions that require complete squid genomes for answers when that technology becomes cheap and widely available.

Physioprof, RPM,

What Venter is doing is called metagenomics (or more accurately, whole-genome metagenomics). At least that's what we call it, and I do this for a living.

It's funny that to ecologists, barcoding means one thing (the DNA equivalent of a Latin binomial), while to people in genomics, barcoding is the process where you add a DNA tag (the 'barcode') so you can simultaneously sequence multiple DNA samples (the DNA tag is attached before the samples are mixed for sequencing).