How is a species like a soup can?

i-545fec96e4e06544f86646212b462424-pop_art_andy_warhol_campbell_tomato_soup-can.jpg That is not a riddle, or rather it's not meant to be, but it's a question worth asking about the barcoding project.

Wired has a nicely written piece about the rationale and program of giving species DNA barcodes and using the gene chosen as the barcode to identify the number of species out there in the world [Hat tip Agricultural Biodiversity]. In it, the founder of barcoding, Paul Hebert, recalls how he came up with the idea:

He says he came up with the idea for the machine in a grocery store. Walking down an aisle of packaged goods in 1998, he indulged in a moment of awe: Here, in a short row of numerals, was the entire retail universe, billions of individual products, identifiable by a tiny machine-readable barcode. If it works for cans of food, Hebert thought, why not for bugs? Why not for everything?

Barcoding, which is something I have criticised and discussed before here, and here, treats species as things that have some invariant property (in this case, a segment of the COI gene) that maps directly onto the entities one-to-one. As Brent Mishler, head of the Berkeley herbarium, says

We're not accusing Hebert of being a creationist, just of acting like one.

Why? Because creationists treat species as having invariant properties. Biology, especially evolutionary genetics, suggests that while it may be true that most members of a species will tend to share most genes, if a gene can vary and still work, it will in a large enough population, and it may also have nonfunctional duplicates that will skew the results. In short, it may work for many species, but it won't work for them all.

What the result will be, given the delay in describing and naming species (there are some ten million animal species known, which may be as little as a quarter of all animal species alive), and checking whether the barcodes actually do map onto actual species, is that the barcodes will become the species for large numbers of animals. That is, if we diagnose them by the barcodes, then that is what is a species - something that has an assayable barcode.

This puts the diagnostic cart before the taxonomic horse, so to speak. It makes the results of the epistemology the matter of the ontology. It's a common slide in systematics, but it remains a problem nonetheless.

The grail here is automatic species identification, a taxonomic tricorder. Kip Will, another taxonomist I met briefly in Phoenix last May, is trying to check the results of the barcode identification, but, as the article says

As hard as Will works to debunk Hebert's claims, [lepidopterist Dan] Janzen works harder to register barcodes. He is trying, through sheer accumulation of insects, to impose the automatic animal identifying machine upon the world.

And therein lies another difficulty. There's a move made by a creationist, Duane Gish, of galloping through a dozen fallacies in his presentations, each of which takes thirty minutes to debunk; he's so well known for it it's called the Gish Gallop. The technical name for this is the Fallacy of Many Questions. Here we have a kind of this fallacy, only it's not done, as Gish does it, disingenuously, but strategically. By forcing many cases on the taxonomic world, it will follow that taxonomists, who are pathetically funded and in need of any resource they can get to, will employ these results before they are tested. So will ecologists and conservationists, government agencies, and industry. The gallop ensures that we will always be many steps behind the testing that Will wants to do, and which should be done.

In some ways, the very use of cladistic software by researchers acts as a "black box" through which data is passed without understanding, giving trees of unclear worth. This is due to the lack of training and experts to employ these techniques. Scientists will continue to use them anyway, because this is required by their professional publications and standards. Here we have another black box that will add to the confusion. This is not how knowledge is arrived at, I fear.

Anyway, the Wired article, by Gary Wolf, is a good introduction, and much more sympathetic to barcoding than I would do, so read it if it is in your field of interest.

More like this

Better idea: rapid full genome sequencing and an online database to upload organism sequences including images of the specimen, capture information, etc. If the barcode region changes, is it a new species? Using full genome comparisons would utilize that variability to establish phylogeny and would intrinsically be less prone to error.

I face similar problems when dealing with parasite strains and types. Most of the identification today is based on molecular barcoding of specific genes or intergenic regions, so you often split strains/types after finding a new marker in a new region of the genome or not, sometimes new markers only help to further estabilish these strains/types. The problems arise from, as you said, "putting the diagnostic cart before the taxonomic horse" as you often have to identify parasite samples coming from patients with a few so-called "reliable" markers and come up with one of the pre-estabilished types in your analysis. This completely ignores the variability of the parasite, aside from creating ficticious biological entities without any actual meaning whatsoever. As result, you end up with thousands of studies to show if strain Y survives slightly better than strain X in room temperature, creating all kinds of crazy correlations that often gives in to ping-pong analysis saying "yes, it does" and "no, it doesn't" - all of it because it was ignored that strain Y is actually a homogeneous population and the markers chosen were insufficient to show it.

This is so wrong, so wrong!

Exactly, you can't just pick out ONE marker for a species, it must be a comparative study of all genomic markers. I may have an SNP or a duplication in a non-coding region (in fact, I do somewhere in the lactase promoter), or even perhaps in a coding region, but I'm still human (much to my own dismay). This is why when I'm explaining this to people I always stress whole genomes as being important for purposes of taxonomy. The technology for genome sequencing is getting to the point where we CAN rapidly analyze a selected animal in a few days, we just need better ways to share this information. I also think it has something to do with our own idea of "species" since we know that a gradient occurs, we cannot say exactly when two populations are "new species" until well after they have been reproductively isolated. Personally, I think the term "species" as defined is still rather ambiguous. I'm interested in hearing other opinions on the matter, though.

How is using the DNA sequence of a (small set of) genome segments to determine species any worse than using number of anthers/trichome shape/larval segmentation pattern? For some genera, there are some pretty esoteric distinctions between species. To get the species conclusively, you have to catch the creature at the correct, possibly brief, developmental stage, and even then there may be an experience-related judgment call. (What does it mean that the tips of species A's antennae are "more rounded" than species B? How rounded do they have to be to be "more rounded"?) All of these features rarely, if ever, speak directly to the ability of two organisms to successfully reproduce with each other - they're all proxies, just like the DNA sequence is.

Traditional taxonomists "bugger up" identification all the time. It's almost an annual event when some genus or another gets a good housecleaning and dozens of named species are lumped in with another. It works the other way, too - we just had an instance where what was thought to be a single species of earthworm now looks to to really be not two, but four separate species. (http://www.sciencedaily.com/releases/2008/10/081010081652.htm)

Sure, DNA barcoding will likely make mistakes. The question is, will it be any worse than traditional taxonomy?

Back when morphology was about all we had, we did not use the complete morphology of one species for comparison with another. Rather we looked at characters which seemed to have some diagnostic utility. Sometimes we were mistaken, sometimes not. There is still a good bit of morphological species description going on. Are we better off with barcoding than we were with just morphology?

By Jim Thomerson (not verified) on 22 Oct 2008 #permalink

If one does a species description correctly, then one must do it as if it is the only species on the planet documenting both inter- and intra-populational variation. Discussing diagnostic characters is aptly left for the discussion. Describing species is often poorly done and barcoding will make it worse.

I also agree with John about tree-building programs - lots of crap going in and coming out.

By michael fugate (not verified) on 22 Oct 2008 #permalink

RM@4

"How is using the DNA sequence of a (small set of) genome segments to determine species any worse than using number of anthers/trichome shape/larval segmentation pattern?"

You have a means of comparison which can look at the underlying causes for these distinctions, and given the background of intrapopulational variation of the DNA, you can make an accurate assessment of the variation's causes. Using segmentation patterns and such does not give an accurate description of all variation within a reproductively isolated population. DNA analysis techniques can, and can even tell you when it is reproductively isolated from related populations.

"All of these features rarely, if ever, speak directly to the ability of two organisms to successfully reproduce with each other - they're all proxies, just like the DNA sequence is."

Except DNA sequences can tell you if the populations ARE reproducing and exchanging alleles. If they are not, you can perform hybridization experiments to see if they are capable of producing fertile offspring.

"Traditional taxonomists "bugger up" identification all the time. It's almost an annual event when some genus or another gets a good housecleaning and dozens of named species are lumped in with another."

Usually the housecleaning is based upon DNA evidence...
"It works the other way, too - we just had an instance where what was thought to be a single species of earthworm now looks to to really be not two, but four separate species. (http://www.sciencedaily.com/releases/2008/10/081010081652.htm)"
Yea...point being? DNA evidence, again...

"Sure, DNA barcoding will likely make mistakes. The question is, will it be any worse than traditional taxonomy?"

DNA barcoding is unnecessary! That's my point, if we focus on the new techniques and methods instead of trying to implement some additional means of identification, all you get will be additional confounding evidence. Get complete sequences of a representative portion of a population, compare them to other populations, done, no need to look at morphology, behavior, etc.

It makes the results of the epistemology the matter of the ontology.

That was a doozy of a sentence! I couldn't let it go by unremarked. It seems to my layman's eye that a good deal of this issue is because neither "solution" (e.g., taxonomy vs. genetic coding) is working from a know reference of objective fact. It is like when the the SI units were defined -- some external reference frame had to be used (diameter of earth, period of atomic oscillation, etc.) In the problem of speciation, there is no "objective frame" to refer to (least wise, I have not heard anyone suggest one). Thus, one cannot claim that either of the two competing systems (taxonomy or bar coding) are objectively better than the other.

The problems will arise when one attempts to use one system as interchangeable or even relatable to the other. The example from Gabriel in comment #2 is a good example: the papers using the bar code are only a problem when one attempts to read them as suggesting conclusions about a taxonomically defined "species" -- or, to put it another way, when people reading the paper think they are reading about a "taxonomic species", and draw conclusions therefrom, when they are really reading about a "bar coded species".

Bar coding isn't going to go away, and the ability to define a "species" of a given sample faster and using a method accessible by non-specialists is something necessary and coming. It seems to be that the taxonomists would be better served by trying to establish as accurate a correlation as possible between the taxonomic system and the bar coded system (and to identify where the correlation cannot be made).

(BTW, in case anyone did not see it, here is an August 22nd article from the NYT about a high school "science experiment" involving bar coding.)

By automandc (not verified) on 22 Oct 2008 #permalink

For those of you interested in taxonomy based upon phylogeny or "bar coding," I would encourage you to read about PhyloCode:
http://www.ohiou.edu/phylocode/
Optionally, the Wikipedia article isn't TOO terrible:
http://en.wikipedia.org/wiki/PhyloCode
My problem with "bar coding," as I stated, relies upon single genes, any polymorphisms in this gene are interpreted as another species, when it may, in fact, just be a new allele.

DNA barcoding as a proxy for species determination is as apt a way as I can think of it, although my reason for agreeing may not be completely clear:

Barcoding for grocery products, using Universal Product Codes (or UPCs), is remarkably well-suited to this subject for no less of a reason than the simple changing of the packaging or coloration of the box or canister can result in the change of the UPC code. The UPC is divided in two: A sequence that identifies the maker, and the sequence that identifies the product item from that maker. In a typical UpC, this is a 10 digit sequence divided in two 5-digit sections, proceeded and followed by a single digit each, resulting in a full 12-digit UPC, with those extraneous digits (often irrelevant) being for the sake of inventory and replenishment values. Note then, that simply changing a small sequence of the barcode for the exact same product does not result in a change of the product you are buying, just its appearance. No actual change in the materiel has occured, as such applied to species identification, no such genetic change has occured, even if it looks different.

The UPC is used to categorize products from the company down into a system of federal stamp-collecting, and is nothing more innane -- for the sake of genetic categorization -- than using the Linnaean hierarchy to divide the Class down first into its lower parts, rather than combine Species together first into higher and higher groupings of taxonomic complexity.

In short, DNA barcoding has more in common with UPC registry information than it does with systematic recognition and utility.

By Jaime A. Headden (not verified) on 23 Oct 2008 #permalink

Jaime, if we were assured that species did in fact have only one form of the COI gene, then they would indeed be a useful diagnostic identifier. But that is exactly what is at issue. Why think that?

I consider genetic barcoding to be an extremely useful tool, and have used it to sort out an otherwise intractable dataset. Like any other research technique, it comes with both assumptions and limitations.

First: A single barcoding sequence is never going to resolve phylogenetic relationships with a high level of confidence. That's not what barcoding was designed to do. It's primarily an identification tool. If it's used according to the underlying assumptions, it's extremely helpful. Any technique will fail if pushed beyond the limits of its scope and design.

Second: When combined with traditional taxonomic techniques (e.g. morphology, detailed genetic data, the fossil record) a mature barcoding scheme will enable non-experts to make identifications that would have been completely impractical beforehand. Example: If an expert ant taxonomist identifies a specimen from Arizona, and submits barcoding sequences for multiple individuals to the central database, that species can now be identified in a collection by an ecologist who is broadly sampling insects from that habitat but is not an ant taxonomist.

Third: Barcoding can help distinguish between two species even if it can't definitively tell us which ones they are. For those of us (yo!) working on some kinds of questions in messy, species-rich systems, there are times when just being able to tell that you're dealing with two different species is close enough.

By Julie Stahlhut (not verified) on 06 Nov 2008 #permalink