When entropy and ecology collide...

...the albino silverback blinks once or twice, says knowingly "Yes, yes", and sends those who do understand math to these two posts at The n-Category Café: "Entropy, Diversity and Cardinality" post 1, post 2. If I read it aright, it means that diversity is measured as the entropy of some metric space, or the probability distributions of that space. Since this is roughly the same thing as Shannon entropy, it is no surprise to find that ecologists have tricked upon the same equations to deal with this problem. More than that I am not competent to say (curse my teenage lack of interest in algebra!)

More like this

cf. doi:10.1098/rspa.2008.0178 methinks; I think biology may be about to transition to "hard" science.

Part 1 is fairly well known in ecology. It's an extremely good way of producing some numbers which look impressive, but have little connection to ecological processes.

If you look closely, you'll see that they introduce the "surprise" function (it leaps out at you from a cupboard. Surprised? I was - I don't have a cupboard in my flat), but the form of the function is chosen for its mathematical properties, and it has a parameter which can be chosen to give different measures, but there's now way I know of to decide which number to use. There's almost no connection to the biology.

There's a tendency in ecology to develop these meaningless statistics and then act as if they're telling us something important. I find it all a waste of time (and I'm a statistician).

The desire to draw on completely meaningless statistics like "similarity" of species as a function of biodiversity and thus human impact on biodiversity is honestly disgusting. Say, for example, a population has recently branched from the parent population and is well on its way to becoming a distinct species (defined as "A segment of a population-level lineage that is evolving separately from other such lineage segments as indicated by one or more lines of evidence" [PhyloCode]), the complete obliteration of this population isn't considered an "impact on biodiversity" so long as the parent population increases at the same rate the divergent population decreases. So many situations can be thought of in which these statistics don't give you an accurate representation of what is actually happening that it seems, to me, anyway, a complete waste of time. Why is this statistical calculation of biodiversity even necessary aside from particular groups making claims about "how they help protect biodiversity" when, in fact, the populations of various organisms change drastically due to human activity without it being represented in these statistics.

Arbitrary statistics about biodiversity give both statistics and ecology a bad name. How about looking at the species individually and assessing our impact upon each of them, then use these data to assess our impact upon the biodiversity?

Without understanding the math, I must agree with you, Jared. Similarity is not, I think, a crucial criterion for conservation. I know some, such as Dan Faith, have tried to produce a phylogenetic diversity metric, but he does not require it to be the sole desideratum.

The notion of "similarity" fails, in my view, because there is no nonarbitrary metric for measuring it. There are as many descriptions of the similarity between two objects as one can find predicates to apply - i.e., an infinite number.

That said, the use of a Shannon style index (so-called Shannon-Wiener index) is merely a mathematical description of diversity, so I suspect what you are objecting to is the use of surprisal, which if I am remembering correctly is a measure of the observer's expectations. Hence it may play nicely into Bayesian accounts of inference and hypotheses. Do you or Bob have any comment on that?

I was mostly talking about how we may be applying selective pressures to species and thus limiting the variation within a species and one would never see that reflected in the statistics. I find that troublesome as we've learned (fairly recently) that populations which are fairly genetically homogeneous (Sarcophilus [specific epithet?]) can be very vulnerable even when the population size is fairly large.

Jared - there are good practical reasons for wanting to estimate diversity, e.g. for conservation planning and reserve design. It's a pity that we can't define biodiversity well enough to come up with a metric that fits to the definition.

John - leave the Bayesians out of this! The surprisal measure is imply a mathematical expectation, so it's not Bayesian per se, but if you find it easier to think of it that way, then that's good. Embrace the Dark Side - it is your destiny.

For me the problems are:
(a) I can't see any way to calibrate surprise: what sorts of differences in probability are needed to make me twice as surprised? This is the whole interpretation of statistics problem (full blog post to follow, hopefully soon).
(b) What does surprisal have to do with biodiversity? I can't see any functional relationship, or any sense that it's a simple summary (as means and variances would be). It's a complicated summary, so it's difficult to know how to interpret it.

I actually find the idea of using similarity less problematic. One could use phylogenetic distance (or coalescence times, if working within species too) as a measure. Of course, this may not have much relationship to functional diversity, but at least it's obvious that it doesn't. If I'm going to be wrong, I'd far rather be wrong in ways that are easily spotted! In principle (-al?) similarity could be measured as distance in niche space, but the problem is defining the niche space, and measuring position within it. So there is a possibility of connecting that to community dynamics, but we'll need a few years of hard modelling work first, to establish the relationships.

Oh, and apparently it's your birthday. Congratulations on another semi-arbitrary landmark! I hope you have many more.

"there are good practical reasons for wanting to estimate diversity, e.g. for conservation planning and reserve design. It's a pity that we can't define biodiversity well enough to come up with a metric that fits to the definition"

Basically: how to save shit--here's an idea, leave the areas you find them alone without jacking it up. Don't put roads through those areas, don't let people build there, etc.

I'm also going to have to disagree with you on the phylogenetic distance thing. You're still saying that just because species x is "similar" to species y, the two don't count as much toward biodiversity than if species a and species z were present at that location instead. You still have two distinct species. Also, as I said previously, analyze our impact on each species individually including our impact on genetic diversity within that species, then use THAT information to gauge our impact upon biodiversity.

I have to be at work in an hour, but I have much more to say on this topic

Jared - your first point is only realistic if the New World Order will be run by conservation biologists. I'm sorry, but this isn't going to happen - conservation biology has already been infiltrated by sociologists and economists, so they don't stand a chance.

On your second point, I'm not sure this is the best blog on which to make arguments based on species essentialism - John has a huge store of long words relevant to the subject that he can fling at us. And in practice, I doubt you'll find a biologist who will argue that, for example, Drosophila melanogaster is as similar to D. simulans as it is to Arabidopsis thaliana, and hence a community consisting of more dissimilar species is more diverse.

I thought the fact that you could measure similarity definitively was one of the requirements of arguments for the randomness of evolution? That specific argument can be found here.

As an aside, doesn't selecting for diversity mean selecting against fitness? I mean aren't the goals of diversity and learning from nature to some extent mutually exclusive, as we will weaken the very mechanisms that give us those solutions. Although I suppose you could argue that heavy handed fitness selection could overwhelm the diversity producing mechanisms and so wreck that goal anyway. Perhaps there is a happy median based on considering the ecosystem as a randomised solution processor.

*sigh*

Ok, I hoped I wouldn't have to get into this much deeper, but there seem to be some misconceptions that haven't been cleared up yet. I'm not sure if you intended to misrepresent my statements, Bob, or if it was accidental, which I think it was, but I'm in no way saying similarity isn't good for purposes of comparison between species, you're comparing species, so compare them by what is similar and what differs. I fear conveying my thoughts on this matter will be very difficult in less than a few pages, but I'll try: "similarity between populations (or species), when used for biodiversity measurements without the additional information of specific populational information, is completely useless for conservation."

Now, I'm NOT arguing for species essentialism! I'm stating that we must look at populational impact WITHIN species to accurately gauge our impact on an ecological region. Here's the kicker on this, the two species represent separate parts of a lineage, completely isolated from one another genetically (fitting the phylocode definition of species). Each individual does not contribute to the population of the other and as such, "similarity" between species is an unnecessary addition to the biodiversity equation.

All I'm trying to convey here is that this form of a biodiversity equation, for purposes of conservation, doesn't tell you anything useful. You can use "similarity" for means of classification, but that doesn't give you any USEFUL information in terms of ecological stability.

I've had a 13 hour day slaving over an HPLC, I need some sleep, but I do hope my thoughts were somewhat coherent.

I have to say I didn't get any whiff of species essentialism from Jared's comments, but I'm a bit dumb. I only wish to add that similarity is not useful for classification either, only for typologies, which are not classifications. IMO.

All I'm trying to convey here is that this form of a biodiversity equation, for purposes of conservation, doesn't tell you anything useful. You can use "similarity" for means of classification, but that doesn't give you any USEFUL information in terms of ecological stability.

Well, except that diversity is meant to be good for stability, so you want a diverse community, which would be interpreted as a community who's species are dissimilar from each other.

On the essentialism point, perhaps that was the wrong word to use (hey, this is a philosophy blog. Every argument is going to reduce to semantics anyway, so why not start there?). My point was that Jared seems to think that Species are objects, and that they are all equivalent (i.e. exchangeable in statistical parlance). I'd be surprised if any ecologist would agree with this. Once we allow differences in, then looking at similarities is reasonable.

Jared - I'm having difficulty in seeing what you think should be conserved. I know this is a complicated question, but it looks as if you're only/mainly interested in within-species diversity, and perhaps the number of species.

Oh, and there's another big problem with treating species as equivalent - it relies on the definitions of species, and who does the classification. You could end up trying to preserve a segment of diversity simply because the species were classified by a splitter.

It is a very complicated issue, and as I am a biologist, not a philosopher, by education and experience, perhaps I am not quite eloquent enough to explain my thoughts. I'll try again:

On the "diversity is good for stability" argument, when you compare the diversity of organisms across taxa, that tells you about as much as saying "I have three dogs, two cats, a snake, five lizards, a horse, and something I cannot classify." How about saying "I have three dogs, one is a very old and sickly lab, the Jack Russell terrier is missing a leg, and the Dachshund is going to die at any minute." Then continue with the specifics of each clade in that manner. Just saying how "diverse" an ecosystem is doesn't tell you what needs to be conserved.

Also, areas with naturally low interspecies diversity (think tundras and the poles) can still have very high intraspecies diversity. This is what indicates populational stability. You can then use these data to figure out ecosystem health. You are also not looking at "biodiversity" now, but a meaningful set of data for conservation.

Similarity uses arbitrary means of comparison as well. We can assign a species concept for each species based upon aspects of the species; i.e. Does it reproduce sexually? Is the population genetically isolated from other closely related populations? And so on.

What needs to be conserved:
1-Intraspecies diversity
2-Indigenous species
3-Large and diverse enough populations for stability of each species

Hey, thanks for that link to the PLoD ONE article, I had not noticed it previously. It's almost dead on with what I was trying to say from what I've read so far with the exception that it still does not address intraspecies variation, however I assume you could include that in the "extinction risk" part of the function.