The chaotic evolution of colony size in ants.Â (Tree re-analyzed from Brady et al 2006, colony data taken from Hoelldobler & Wilson 1990 and other sources)
This tree depicts how colony size evolves in ants.Â The purple/blue colors represent small colonies with only a few to a few dozen ants, while the yellows and oranges represent species with enormous colonies of tens or hundreds of thousands of individuals.Â What's exciting about this rainbow-colored figure?
If you were expecting ant evolution to be an inexorable march towards larger and more complex societies, this tree should come as a surprise.Â Ant colony size is all over the place.Â Not only is there no general trend towards larger colonies, some lineages seem to be shrinking down from more populous ancestors.
Colony size evolution is not the subject of this post, though.Â I'm going to whinge instead about how frustrating I found the process of making this figure.Â
You see, there are ants we know a fair bit about.Â We know what they eat, how many queens they have, and how large their colonies are.
Then there are the species that the NSF-funded "Assembling the Ant Tree of Life" (AToL) group sampled for the ant phylogeny.Â Those two sets of taxa do not show much overlap.Â And this lack of overlap means that using the AToL trees as a platform for revealing patterns in ant evolution will be a slower and more complicated slog than it ought to be.
I made the above tree by taking the AToL molecular data set from Brady et al (2006) and re-inferring it in MrBayes using only those species for which I could find reliable colony size data.Â Most of the colony data come from table 3-2 in Hoelldobler & Wilson (1990), but I also drew from the literature.Â Of the 162 AToL taxa, I arrived at only 30 with available colony size information. Even so, I still fudged on a couple of species by swapping data in from congeners.
Why did AToL choose the species they did?Â Well, most of the AToL PIs are taxonomists. From a taxonomic perspective, Lasius californicus is locally available and works just fine as a representative for the genus.
But for downstream users of the phylogeny, for the folks who wish to use the tree to study how social behavior evolves in ants, the AToL design is odd indeed.Â The most researched ant species tend to be either trampy or european, but AToL largely sampled ants from California and Madagascar.Â The result is a tree connecting a bunch of species we don't know much about.
For a taste of the AToL taxon sampling, consider the following.Â The most studied Lasius is the common european garden ant L. niger (Google Scholar hits: 3,370); AToL sampled Lasius californicus (G.S. hits: 6).Â The most studied Formica species are F. rufa (G.S. hits: 3,280) and F. polyctena (G.S. hits: 1,830);Â AToL sampled F. moki (G.S. hits: 39).Â The most studied Eciton army ant is E. burchellii (G.S. hits: 830); AToL sampled E. vagans (G.S. hits: 51).Â Of the ten most studied ants only one, Linepithema humile, is included in AToL.
Absent field studies to fill in data for the AToL species, we have two ways to wed our knowledge of ant biology to the mismatched phylogeny.Â First, interested researchers could drop the money to sequence relevant loci from taxa of interest and re-analyze the AToL data to produce a new, more comprehensive tree.Â This option is the more correct one, but it will also be expensive.Â (Do any of my independently wealthy readers wish to fund the AToL Patch Project? It'd be the best $100,000 you ever spent.)
Alternatively, and more cheaply, researchers could use the AToL tips as proxies for well-studied taxa.Â For example, we could assume that the position and branch lengths for Solenopsis xyloni are reasonable phylogenetic surrogates for Solenopsis invicta and plug in the biological data for the better known species.Â An easier option, but one that rests on a shakier set of assumptions.
As way of disclaimer, I don't mean this post as an affront to AToL researchers.Â After all, they are among my absolute favorite people.Â And, they've done a simply fantastic job covering the global diversity of ants from a systematic perspective. Â It's just that, well, they could have anticipated what the larger ant community might wish to use their trees for.Â We're on the cusp of some powerful analyses on how and why ants evolved, and the lack of phylogenetic coverage of well-studied ants is a frustrating speed bump.
Brady, S.G., Fisher, B.L., Schultz, T.R. & Ward, P.S. (2006) Evaluating alternative hypotheses for the early evolution and diversification of ants. PNAS, 103, 18172-18177.
HÃ¶lldobler, B., Wilson, E.O. 1990. THE ANTS. Harvard University Press, (Cambridge MA, London UK) pp 732.
That is an interesting perspective from the point of view of studies of quasi-model organism.
On the other extreme, I am always dismayed by studies into ant biology that start looking at their system of interest (be that behaviour, genetic, development, etc) in a comparative way and choose a very odd and skew sample of the ant phylogenetic diversity. Many times the sample consist of a few closely related species within a "higher" subfamily (e.g., Myrmicinae), when they could have very well add some "lower" ant representatives that were also in their backyard to address properly the evolutionary questions they are after.
I hope all these new phylogenies will encourage a more lineage-diverse choice of taxa.
I may be wandering off topic, but this post does seem to at least tangentially touch on one of my pet peeves â the lack of natural history and ecology in taxonomic and systematic studies. Of course, you have notable exceptions in the ant world from Wilson on down, but in general arthropod systematists and their granting bodies seem to have little interest in what their animals are doing. Molecular phylogeny hasnât caused any obvious improvement in this pattern and may have made it worse.
For example, Lasius flavus has a great kleptoparasitic mite that hangs under an antâs mouth, palpates with hypertrophied legs I, and tricks an ant into regurgitating dinner (see Franks et al. 1991 J Zool Lond 225: 59-70). The mite, Antennophorus grandis, is in an early derivative and extremely phylogenetically interesting group, but this is the only study of the biology of one of its species that I know about. I canât even tell you if other Lasius have such mites.
John Wenzel had a great rant about natural history-free phylogeny at an ESA meeting a few years ago, and he said it better than I ever could, so Iâll shut up now. But I think the dissociation between taxonomy and natural history is part of the reason for the decline in interest in and support for systematics.
Excellent point, and I think the same point could be made for so many ecological studies where species are often treated as fungible and natural history is either ignored or grossly misinterpreted.
Interesting point, Alex. Taxonomic selection for AToL-like projects does tend to be based on which members of a group are easiest to get with a suitable taxonomic spread, but I wouldn't have guessed that the most accessible species for taxonomists tend not to be the same ones used for other studies. Perhaps this comes from the willingness of the taxonomists involved to go out and collect specimens themselves: if they instead begged for specimens, the ones they got would probably be the ones most studied (and left in 70% ETOH on car dashboards, but that's a different issue).
A related sampling issue with some methods is that non-random selection based on traits can affect analyses. For example, if we were going to do a study of colony size, and had money for sequencing, a natural temptation would be to get extreme examples: the most "primitive" ants with small colonies, ants with huge colonies, and a sampling of more "normal"-sized species. The problem with this selection is that rates you'd estimate would tend to be biased towards higher ones (since the extreme trait values are sampled better). I'm part of a group studying flower evolution, and deciding on a suitably "random" set of taxa for looking at traits but which still had enough data already to put on a tree was a non-trivial problem (solution involved downloading GenBank and writing a few perl scripts).
And minor note: There are 83 Google hits for "Acanthomyops californicus", which is better than "Lasius californicus" but is still no L. niger. I guess this shows how slowly new taxonomy is adopted on the Web, too.
Thanks for your comments, Brian.
I had a conversation recently with Corrie Moreau about what we really need for statistical studies of evolutionary patterns: a taxon list where we pull species names from Bolton's catalogue at random and then go out and sequence those (or alternates, depending on the ease of finding the chosen species).
I don't like our chances of funding a Random Ants Project, though.
If you are down that route, shouldn't you also start to sample the genome at random for characters? Why concentrate in those genes with existing primers just because they are "model" genes (e.g., wingless, ong-wavelength rhodopsin)?
@Roberto Keller: you would choose characters at random if your question were "how do nucleotides evolve in ants?". But here, the question is looking at how some morphological trait evolves, and it's very unlikely to be correlated with any of the genes we'd use for phylogeny. So, what we'd want is the best estimate of a phylogeny (where the ants are sampled without regard to the trait of interest), but everything doesn't have to be random.
[...]the question is looking at how some morphological trait evolves, [...]. So, what weâd want is the best estimate of a phylogeny (where the ants are sampled without regard to the trait of interest)[...]
To answer this question I necessarily have to guide taxon sampling based on the distribution of the morphological trait of interest: I would want to make sure I include relevant taxa so that all the (known) existing variation in the trait in question is represented in the analysis.
Ah, that's the issue. You can sample randomly but extensively enough to get the variation, but if you selectively sample, you can get bad estimates (depending on what question you want to answer). For example, imagine you have species with trait values 0, 2, 4, 6, and 8. If you are sampling just three of them, you might choose 0, 4, and 8 to get the variation, but this dramatically overestimates the variance (and, if you were using a phylogenetic rate estimate, this would almost certainly overestimate the rate).
This sort of thing could have played a role in that famous paper a few years ago about re-evolution of wings in stick insects. If most stick insects are winged (note that I'm not sure whether this is true), but sampling is "even" so that equal numbers of winged and wingless taxa are included, the estimate of the rate of gain of wings will be much higher than is true given all the species or species sampled at random, possibly leading to the conclusion about re-evolution. [Note that this is more of a hypothetical prediction about the paper -- I haven't done the re-analysis and I don't know the true distribution of character states].
Wayne Maddison had a paper showing this sort of effect, though there, rather than biased sampling by taxonomists, there was biased sampling by dint of different character states affecting diversification rates. It's a different mechanism, but essentially similar in why and how it affects rate estimates. Maddison. Confounding asymmetries in evolutionary diversification and character change. Evolution (2006) vol. 60 (8) pp. 1743-1746
Your example is very clear, but here's the thing: you are treating species (terminal taxa) as if they were individuals in a population, and you are treating the trait of interest as if it were a population parameter you want to estimate. Random sampling is highly desirable for the population question. However, I'm not sure it is desirable for the phylogenetic one, because traits in species are strongly constrained by their evolutionary history, so species within a clade are not independent.
Forget about stick insects. Here's an example we both are more familiar with. Suppose you want to investigate the evolution of thoracic architecture in worker ants (i.e., how the different thoracic sclerites have (un)fused with each other). You find that different clades vary in what plates are fused, for example, some have a movable pronotum while in others the promesonotal suture is completely fused. Now, it happens that all Myrmicines have the same arrangement (basically, all the thoracic plates are fused with each other with no trace of sutures among them except for the sides of the pronotum with the mesopleura. The details are not even important). It also happens that Myrmicines are almost 50% of all know ant species. If I were to take Bolton's catalogue and sample species at random my taxon sampling will be strongly bias towards species that are identical for my trait of interest, at the expenses of species with unique architectures but that are more rare. This will happen even with extensive sampling.
From the point of view of thoracic architecture, sampling species of Myrmicines again and again is a waste of time and resources, because the distribution of this trait, like most traits, in the species we see today is strongly dependent on common ancestry. I can treat the whole clade as just one terminal. Choosing my taxa based on the classification (as done by Brady et al. 2006) will also not be the best way to answer my question, but it will certainly be better that choosing species at random. In my opinion, the best is to guide taxon sampling by how the distribution of the traits of interests (this is what I did for my study on ant morphology).
As Alex said, this is not to trash the ant-AToL effort, which was meant to answer primarily a classification question.
It may be that, as you mentioned, we have different questions in mind, e.g., the way in which thoracic architecture has been modified in evolution versus the rate at which thoracic modification occurred.
I'll check Maddison's paper. Thank you.
If you were using parsimony to do the reconstruction you describe, using 100 Myrmicines or just 2 would result in the same reconstruction. However, if you were using a model-based method for reconstruction, undersampling uniform Myrmicines throws out a lot of history that suggests a low fused -> unfused rate in favor of including the parts of the tree where there have been more changes. This could affect the reconstruction: effectively, the "cost" of fused to unfused transitions has been lowered, so reconstructions of fused ancestral states may be more likely.
Of course, as you point out, uniform sampling might miss rare character states. This introduces a different sort of error: it's possible that some ancestral lineage had a rare state, but unless we sample at least one species with that trait, we'll never make the correct reconstruction [barring doing something silly but common like assuming uniform root state probabilities, which probably doesn't make sense for rare character states]. So, in a world of limited resources, is it better to introduce error by missing some character states, or introduce error by doing biased sampling? I suspect it depends on the structure of the tree, distribution of states at the tip, the method used, and the question you want to address. You could probably address this in a particular study by doing some power analysis first (make data under what you expect is the true tree, and try different ways of subsetting the taxa and seeing the effect).
There should be ways to do what you want to do, that is, efficiently sample the diversity of states, but avoid the biases I'm worried about. For example, in your fusion example, you could include the information on deliberate undersampling of Myrmicines in your function that estimates ancestral states so the effect on rates is incorporated without having to explicitly include all those species (though to do this absolutely properly, you'd have to know the branch lengths within the Myrmicine tree). You could also do this in a Bayesian way by specifying a low prior probability on fused to unfused transitions. As far as I know, these sort of things aren't implemented in software (though maybe SIMMAP allows transition rate priors), though there are people working on them. That would be the best of both worlds: efficiently sampling yet avoiding biases that result (at the cost of making some assumptions, of course).