Luskin on gene duplication

By pharyngula on October 1, 2007.

Casey Luskin has to be a bit of an embarrassment to the IDists…at least, he would be, if the IDists had anyone competent with whom to compare him. I tore down a previous example of Luskin's incompetence at genetics, and now he's gone and done it again. He complains about an article by Richard Dawkins that explains how gene duplication and divergence are processes that lead to the evolution of new information in the genome. Luskin, who I suspect has never taken a single biology class in his life, thinks he can rebut the story. He fails miserably in everything except revealing his own ignorance.

It's quite a long-winded piece of blithering nonsense, so I'm going to focus on just three objections.

First, Luskin tries to trivialize gene duplication, a strategy that Michael Egnor also followed.

Yet during the actual gene-duplication process, a pre-existing gene is merely copied, and nothing truly new is generated. As Michael Egnor said in response to PZ Myers: "[G]ene duplication is, presumably, not to be taken too seriously. If you count copies as new information, you must have a hard time with plagiarism in your classes. All that the miscreant students would have to say is 'It's just like gene duplication. Plagiarism is new information- you said so on your blog!'"

This isn't right, on many levels. Copying a pre-existing gene does create new information … but it's just a small amount. Luskin can't be serious in considering this a weakness: evolutionary biology would predict only small changes at any one time. If a process produced a massive increase in the information content of the genome in a biologically functional way (that is, not just the production of random noise), then we'd have to say that you've found evidence for Intelligent Design. A succession of small genetic changes is what we expect from evolution and genetics, and that's what we see.

The plagiarism problem Egnor invents is nonsense. The primary problem with plagiarism in the classroom is that it is unethical, and represents the appropriation of ideas from another source without acknowledgment. If a student were to quote a source wholesale while providing full attribution, it would not be considered plagiarism…although the student would be failed for failing to demonstrate any understanding of the material or providing any significant new insight. We expect our students to demonstrate intelligence, after all; duplication is not a product of intelligence.

And again, whoosh, this will fly over their heads: the processes we describe in evolutionary genetics exhibit incremental increases in information in the absence of intelligent input — contrary to the expectations of the IDists.
It wouldn't be a creationist article without a quote-mine, and Luskin obliges. You'd think they'd learn, someday … whenever I see somebody at the Discovery Institute quote something from the scientific literature, I know to immediately check the source, and I am never disappointed. They always mangle it. Here's Luskin in action:

A recent study in Nature admitted, "Gene duplication and loss is a powerful source of functional innovation. However, the general principles that govern this process are still largely unknown." (Ilan Wapinski, Avi Pfeffer, Nir Friedman & Aviv Regev, "Natural history and evolutionary principles of gene duplication in fungi," Nature, Vol. 449:54-61 (September 6, 2007).) Yet the crucial question that must be answered by the gene duplication mechanism is, exactly how does the duplicate copy acquire an entirely new function?

Oh, dear. That sounds like an awful admission, doesn't it? Would you be surprised to learn that it's very much like the infamous quote from Darwin, where he admits that the evolution of something as complex as the eye seems absurd, but then goes on to explain how it happened? That's exactly the case here. Luskin has pulled out the first two sentences of the abstract, where the authors set up the problem they are about to address, and throws away the rest of the work where they go into detail on the evolutionary histories of orthologues in fungal genomes. Here's the full abstract.

Gene duplication and loss is a powerful source of functional innovation. However, the general principles that govern this process are still largely unknown. With the growing number of sequenced genomes, it is now possible to examine these events in a comprehensive and unbiased manner. Here, we develop a procedure that resolves the evolutionary history of all genes in a large group of species. We apply our procedure to seventeen fungal genomes to create a genome-wide catalogue of gene trees that determine precise orthology and paralogy relations across these species. We show that gene duplication and loss is highly constrained by the functional properties and interacting partners of genes. In particular, stress-related genes exhibit many duplications and losses, whereas growth-related genes show selection against such changes. Whole-genome duplication circumvents this constraint and relaxes the dichotomy, resulting in an expanded functional scope of gene duplication. By characterizing the functional fate of duplicate genes we show that duplicated genes rarely diverge with respect to biochemical function, but typically diverge with respect to regulatory control. Surprisingly, paralogous modules of genes rarely arise, even after whole-genome duplication. Rather, gene duplication may drive the modularization of functional networks through specialization, thereby disentangling cellular systems.

Tsk, tsk. It's an interesting paper that documents the details of a number of cases of gene duplication and divergence, and Luskin ignores it all (I suspect because he lacks the competence to understand it) to imply the paper is about our failure to understand the processes. I just mentioned that my primary concerns with student work are that they demonstrate good citation ethics and that they show understanding; Luskin has just failed on both counts. He has misrepresented the sense of a paper and he has shown that he doesn't understand the work. A student who turned in that kind of shoddy effort to me would get an immediate "F".
Finally, Luskin shows that not only does he fail to understand the duplication part, but he hasn't got much of a grip on the divergence part, either. He invents a silly challenge of his own.

So here is my "Information Challenge": For the sake of the argument, I will grant that every stage of the evolutionary pathway I requested above will survive, and thus I'll give natural selection every possible benefit of the doubt. What I need is a step-by-step mutation account of how one sentence evolved into the other wherein the sentence remains functional - i.e., it has comprehensible English meaning - at all stages of its evolution. In short, I request to see how:

"METHINKSDAWKINSDOTHPROTESTTOOMUCH"

can evolve into:

"BUTIMSUREDAWKINSBELIEVESHEISRIGHT"

by changing the first sentence one letter at a time, and having it always retain some comprehensible English meaning along each small step of its evolution. This seems like a reasonable request, as it is not highly different from what Darwinists are telling me can happen in nature.

No, it has some significant differences. The genetic code is degenerate — there are many synonyms in the 'language'. Much, but not all, of a protein is going to be tolerant to a fairly wide range of amino acid substitutions. That means that in this analogy, the simulation ought to tolerate a great many misspellings, and that many of the words ought to be dispensable as far as generating meaning. In addition, evolution isn't trying to drive one amino acid sequence to another specific amino acid sequence; selection is only going to work for retention of general functionality. English has too much specificity to work in this analogy. If he said he had a text string with a lot of gibberish containing the words "DAWKINS" and "PROTEST", and he wanted to see a step by step series of shifts that turned it into a string with some other gibberish that conserved "DAWKINS" and evolved the word "RIGHT", and that slight misspellings in the intermediates were acceptable, he'd probably declare the exercise trivial and too easy — but that's exactly what happens in the evolution of proteins. It's not pre-specified, it's fault-tolerant to a degree, and it's not that big a deal.

And furthermore, we already have the real thing. Not an analogy, not a guess, but a "step-by-step mutation account" of how one functional protein evolved into another protein with a different function, by a process of gene duplication and divergence. He might try reading Ian Musgrave's summary of the evolution of the cortisol and aldosterone receptors. It's cool stuff, it describes exactly what he demands in a real protein, and the degree of detail goes right down to the order of single amino acid changes. It's perfect.

Of course, we also know exactly how the IDists react when scientists do provide explanations in terms of tiny incremental changes. Suddenly, the details become "piddling", and they ignore them. I expect Luskin will do the same.

Pathetic in his ignorance, appalling in his dishonesty, and disgraceful in his unwarranted arrogance … that's Casey Luskin. It's really a mark of the growing desperation of the Discovery Institute that they are constantly dragging out this pipsqueak lawyer to lecture the public on biology.

More like this

Evolution news and views does not allow comments (surprise!), so none of the sheep have any idea how vapid these 'statements from "authority"' actually are. Here's a thought:

Perhaps we (you; somebody; not me-- i don't have the skills) needs to set up a blog/website that compiles all of the rebuttals to their inanity.

And I nearly spewed my juice when I spotted this at the bottom of the Evolution News and Views website:

cut
"The misreporting of the evolution issue is one key reason for this site."
paste.

So true!

Honestly? He probably gave up on that article the second he saw the words "comprehensive" and "unbiased". Completely foreign to him, I'm sure.

Suddenly, the details become "piddling", and they ignore them.

I'll take "Details" for $100, Alex.

And the answer is:

"[I]t's not ID's task to match your pathetic level of detail."

What's missing from your explanation is mutation. After a gene has duplicated, one of the copies is under reduced selection pressure and can mutate -- and mutation equals new information. The duplication merely makes it possible to add information instead of substituting it.

The misreporting of the evolution issue is one key reason for this site.

EvolutionNews.org sure got that right! They mean it literally!

Re the second point: This person is a lawyer?

That abstract may be difficult for a non-biologist to understand, but that a lawyer can totally invert its intent...

There's probably some psychological explanation for it.

Casey Luskin has a B.S. and M.S. in Earth Sciences from UC-San Diego and has been affiliated with the Scripps Institution for Oceanography Paleomagnetics Laboratory (an article on paleomagnatism on the Snake River Plain of Idaho in 2004).

David (#4), doesn't duplication alone represent an increase in "information" (in the sense of redundancy)?

If I give someone two copies of an instruction manual, they can lose one copy (all the original information) and still retain all the original information.

Duplication is an essential first step, if you want to modify an original without changing it.

OK.

You close with this paragraph:

Pathetic in his ignorance, appalling in his dishonesty, and disgraceful in his unwarranted arrogance ... that's Casey Luskin. It's really a mark of the growing desperation of the Discovery Institute that they are constantly dragging out this pipsqueak lawyer to lecture the public on biology.

Mostly accurate rhetorical description, but what does the rebuttal gain by the name-calling epithet "pipsqueak"?

#4 and #8
Yes, of course the copy is what increases the information, but the IDIOTS then say a copy isn't new information it is just copied information". So you have to point out that when one of the copies mutates you convert your increase in information to new information.

So Luskin actually cites Egnor as a way of supporting his argument.

How persuasive.

Well done in pointing out the errors that stalk all the papers similar to this, there is no shortage of amnesia and selective bias on display.

May I suggest that the real problem with the plagiarism example is that the concept of "information" used by the creationists is incoherent or useless. Of course, there are paradoxes of "information" when dealing with such a concept. One might, for example, say that plagiarism does represent "increased information", in the sense that at least the student has chosen which source to reproduce. The student might be given some credit for chosing a relevant or reliable source.

Mostly accurate rhetorical description, but what does the rebuttal gain by the name-calling epithet "pipsqueak"?

A bit of rhetorical flourish, perhaps?

At any rate, who cares? Ooooooooooh, PZ called Luskin a "pipsqueak." Pass the smelling salts and the clutching-pearls, as I fear I am getting the vapors.

Or something.

but what does the rebuttal gain by the name-calling epithet "pipsqueak"?

Entertainment value. At least, I enjoyed it. The rebuttal proper does not include the last paragraph, nor, indeed, the first - those would be the introductory and summary paragraphs. Noting the Oxford definition, I'd say the epithet is amply supported by the evidence presented.

Bob Lane- Have you ever SEEN or HEARD Casey Luskin? No? Then shut up. Pipsqueak in this instance is not an epithet, it is a description. Luskin is the very embodiment of the "pipsqueak". Small in stature, high in voice, small in mind = pipsqueak. It would also be accurate to describe his as a lying weasel shill for his DI Moony Masters.

HTH :)

Couldn't read Luskin's piece after the first sentence: "In Part I, I demonstrated that specified complexity is the appropriate measure of biological complexity."

Appropriate measure, eh? Great, and what are the units of this measure?

I know his challenge is meaningless, but its also a common word game. It certainly seems possible, with a little help from a few thousand pharyngula friends.

ME THINKS DAWKINS DOTH PROTEST TOO MUCH.
BETH INKS DAWKINS; DOTH PROTEST TOO MUCH
BETH INKS DAWKINS; DOTS PROTEST TOO MUCH
BETH INKS DAWKINS; BOTS PROTEST TOO MUCH
BE THINES DAWKINS; BOTS PROTEST TOO MUCH
BE THINES DAWKINS; BETS PROTEST TOO MUCH
BE THINES DAWKINS; BETI PROTEST TOO MUCH

...

BUT IM SURE DAWKINS BELIEVES HE IS RIGHT

At any rate, who cares? Ooooooooooh, PZ called Luskin a "pipsqueak." Pass the smelling salts and the clutching-pearls, as I fear I am getting the vapors.

Why don't they just have some scientists do the science writing over there? They can't be that hard to find. (Oh yeah I keep forgetting they don't like science.)

May I suggest that the real problem with the plagiarism example is that the concept of "information" used by the creationists is incoherent or useless.

All they have is analogy this, analogy that, blah blah. But no real stuff.

"Imagine for a moment, if you will, that if the Cambrian explosion was little teeny trucks and busses, then evolution can't do that!!!"

Copying a pre-existing gene does create new information ... but it's just a small amount.

It's exactly double the information of the original gene. If you are using Shannon, which you have to because there is no other wat to do it.

The key, of course, is "biologically functional way" which is pretty much the same as "meaning", and which Shannon IT has nothing to do with. You can, in fact have a huge increase in Shannon Information without it being very "meaningful"... polyploidy.

The problem is that in conversation the words "information" and "meaning" are pretty much synonyms. However, the only objective measure of information is Shannon IT, and it is completely divorced of any notion of meaning/functionality, plus, as an added bonus, it is completely self referential, so it doesn't actually apply to this.

The usage of "information" by the IDiots is most charitably described as "inchoate", but *we* have to know better, just so we can rip them a new one.

Of course it would be more fun to turn it into

CASEYLUSKINISAQUOTEMININGCRACKPOT

My father is struggling with belief in evolution. He's come a long way. One argument that helped is how evolution is basically a two step process:
Variation. Selection.
Variation. Selection.
And on like that.
Then I explained to him about sexual reproduction as chromosome shuffling, and crossing over, and gene duplication, and inversions, as well as point mutations. Then I asked, If evolution were not true, why would there be all these mechanisms to promote variation if not to promote evolution through selection?

I think that one of the biggest problems in these "information" arguments that ID information mongers either do not understand or purposeully gloss over is the fact that there is no 1-to-1 phenotypic or physiological change for each and every mutation, regardless of typw, whether it si a pointmutation or a segmental duplication.

Most point mutations do nothing, but some can be lethal. Some gene duplications do nothing. Others, by virtue of the increased expression of their product, can have major phenotypic of physiological consequences. The IDcreationist information mongers just place ALL mutations - in this case, gene duplications - in one basket and want to be able to directly apply their idiosyncratic post-hoc definitions in an all-encompassing manner.

A gene duplication does not add "new" information according to their concocted definiton? well, gee, it must not be important then.

This is why whenever I have the opportunity, in my classes I explain the limits of the language and computer analogies so that at least my students will be able to tell when they are being lied to by these people.

Hey J-Dog, have you ever SEEN Bob Lane? No? Then I would advise you not to piss him off.

PZ:

The fallacy in Luskin's argument is made all the more clear by your excellent explanation of the process.

I've learned a few things that I'm going follow up on with some intertubes "research". I mean, what's federal funding for anyway?

If I may speak the f-word... good "framing" is not just about making people feel comfortable. People should feel uncomfortable about being associated with this kind of stupidity. A dispassionate enumeration of errors would fly over the heads of the demographic that needs the wake up call. It lets this be played as a disagreement between two groups of scientists.

It's no such such thing; and that needs to be stated plainly and bluntly. This is a disagreement between scientists and idiots. Luskin is a pipsqueak. He's an attack mouse; a completely ineffectual buffoon. He's profoundly ignorant to the point of reveling in delusion. And yet he is a spokesman for the Discovery Institute!

I think this is an effective "frame". Luskin is not simply someone with a different view. What he is, is stupid. And people -- Christians especially -- need to ask whether they want to be seen as supporting a group that is this far out to lunch. Many of them won't follow the details of where it all falls apart; but they will follow a plain statement that Luskin is a lightweight no-nothing.

It's not a choice between two scientific models. It's a choice between science and ignorance. Let's say so.

Wouldn't gene duplication that results in the duplicate being under a different promoter count as 'information increase'? Particularly if that promoter is part of a signal transduction cascade, or somesuch. At the very least, suddenly that protein becomes part of the suite of genes up- (or down-) regulated by the signal, even if it doesn't do much that's relevant to the signal.

By the fuzzy definition alluded to be IDiots (which seems to be "stuff that does stuff... and stuff"), that has to count as some sort of information increase.

Duplication does not even need to be the first step in the dance:

Allelic divergence precedes and promotes gene duplication

By the fuzzy definition alluded to be IDiots (which seems to be "stuff that does stuff... and stuff"), that has to count as some sort of information increase.

The trick to getting them to shut up is to demand that they produce a metric. If you can't measure it, then you can't use it in math.

"If a process produced a massive increase in the information content of the genome in a biologically functional way (that is, not just the production of random noise), then we'd have to say that you've found evidence for Intelligent Design. A succession of small genetic changes is what we expect from evolution and genetics, and that's what we see."

This is a somewhat surprising statement, given how commonplace genome duplication is turning out to be. In the case of allopolyploidy (polyploidization involving hybridization), this could add a second, and different, genome to the nucleus in one shot. This would, of course, have nothing to do with design, except in cases where it is a human breeder doing this with crops.
Perhaps you can clarify what you meant there?

I was curious about what Casey Luskin looks like, so I ended up at his home page, where I did not find a picture of him, but did find this: Casey is a "Jew for Jesus" (probably the one characteristic that could trump my already visceral reaction to him).

In my view (and experience), if you have a warm feeling for the religious tradition you were raised in, that can be a reasonable motivation to continue to practice. To take on a different religious doctrine, OTOH, generally strikes me as hard to understand, i.e., why compromise your openmindedness voluntarily? But to take on a new religion while proclaiming that it is really part of your old religion -- that, to me, approaches a DSM-IV criterion.

What I mean is that gene duplication produces copies that are very nearly identical to the original; we can see exactly where the new version came from. No intelligence is required, just error.

What would support ID is the appearance of a novel gene, with multiple functions and complex regulation, and no antecedents -- an abrupt increase in functional complexity that didn't come from error.

Allopolyploidy is a good example of a case where we do see a sudden jolt upward in the information in the genome...but we can also see where the information came from and can explain it very simply, without invoking a designer.

David Marjanović said:

What's missing from your explanation is mutation. After a gene has duplicated, one of the copies is under reduced selection pressure

But can those liberal athiest scientists tell us WHICH copy? Bet they can't. Ergo we shouldn't try and work it out and CREATIONISM IS RIGHT. HA!! PTL!!!1!

PS. Hope you liked the creo touches - I am particularly proud of the missing ellipsis after your quote :-)

If you use Kolmogorov complexity, then it is an easy exercise that C(xx) > C(x) for infinitely many strings x. This doesn't prove that duplication always increases complexity, but it can. So while duplication is one possible source of increased complexity of the genome, the more significant source is divergence after duplication due to mutation. Really, this is not hard; we discuss precisely this in my 4th year course on formal language theory at Waterloo. I don't know why the creationists can't figure it out.

Appropriate measure, eh? Great, and what are the units of this measure?

The Timecube? :->

But can those liberal athiest scientists tell us WHICH copy? Bet they can't.

Of course not. It's random. :-)

If you use Kolmogorov complexity, then it is an easy exercise that C(xx) > C(x) for infinitely many strings x.

A straight duplication would, IIRC, increase the KC by exactly one instruction, no matter how much is duplicated (gene, chromosome or entire karotype), so it is a very small increase in KC. But, again, KC doesn't actualy speak to "meaning", that is, to functionality.

The big problem with ID as I see it is even if his arguments where correct they still don't do anything at all for ID as an idea.

Why not test your ID hypothesis rather than just spend your time flicking boogers at real science?

Shorter Egnor/Luskin: 1+1=1

Re #4, #8, #9 (and any others I missed!)

It's not just that duplication allows mutation to modify one copy (or its control switches). The mere duplication itself can make a significant difference in practice. Having more of a particular gene product around can shift the balance in any interactive competitive promotion and inhibition process. That can cause knock-on effects during development of a organism as well as changes in ongoing cell operations. Something might end up longer (dog's nose) or stronger or more rapid (neurone firing?) or wider (butterfly spots?) or finer (in divisions).

Aha! I thought one of these was probably already on Pharyngula.

Dear Dr. Luskin,

Please focus on writing your own peer-reviewed articles based on your own original research.

Thank you.

No, Graculus, your argument just shows that C(xx) <= C(x) + c, where c is the cost of adding a "duplication" routine. (The <= arises because you don't know that there might be a shorter way to get to the duplicated string without actually duplicating it.) The exercise I am referring to asks you to show that C(xx) > C(x) for infinitely many x. Perhaps it is not completely trivial; most of my students can usually solve it.

Just a thought, but perhaps one of the benefits of us New Atheists (y'know, the ones who don't politely pantomime the sign of the cross after grace at family dinners) like Dawkins, Hitchens, Harris, and others, is that they attract the attention of scientishes (get it?) like Luskin, leaving those actually doing work in the trenches alone.

(The <= arises because you don't know that there might be a shorter way to get to the duplicated string without actually duplicating it.)

Note to self, don't post before coffee. I meant that as the upper limit. The most KC added....

Anyways, the main point is that Kolmogorov Complexity doesn't have anything to do with complexity of function (at least that we can tell, so far).

Hey, I should put that tandem repeat article up here today...especially since I'm swamped with work. Thanks for reminding me!

Casey Luskin has a B.S. and M.S. in Earth Sciences from UC-San Diego and has been affiliated with the Scripps Institution for Oceanography Paleomagnetics Laboratory (an article on paleomagnatism on the Snake River Plain of Idaho in 2004).

So? How is that relevant? He's not a biologist. He holds no degrees in the life sciences. Only the one publication on a completely unrelated field? So apparently he wasn't even competent in that and has since left to become a lawyer.

As a lawyer, even Luskin knows he couldn't testify as an expert on these matters. He'd be eliminated on the first question.
Opposing counsel: Mr. Luskin are you a biologist?
Luskin: No.
Opposing counsel: Your honor please excuse this utterly ignorant person from the courtroom. He is completely unqualified.
Judge: Mr Luskin, you are recused. Buh bye.

Hmmmmm... Methionine, Glutamic Acid, Threonine, Histidine... probably not worth the effort.

Perhaps I've lost the plot entirely, but when these nincompoops say things like:

"[G]ene duplication is, presumably, not to be taken too seriously. If you count copies as new information, you must have a hard time with plagiarism in your classes. All that the miscreant students would have to say is 'It's just like gene duplication. Plagiarism is new information- you said so on your blog!'"

aren't they making the fundamental mistake (intentionally or no) of stripping context from the science?

If the paper in question is to be viewed as a genome, then gene duplication would be more like a paragraph repeated within the paper. Clearly the original paper and the new paper with the repeated paragraph are different in terms of their information content, even if the repetition doesn't function any differently than its twin.

Small accumulated changes and misspellings over time, especially of selected key words, could alter the meaning of the paragraph entirely, while others would leave its meaning largely unaffected.

Thus their own objection, once corrected for that loss of context, provides a relatively simple/simplistic analogy for showing how gene duplication changes information content.

There was an article published only a few weeks ago about gene duplication alone having a direct and very visible effect on a species.

www.physorg.com/news108572275.html

Blasted link doesn't appear to be working. For those too lazy to copy and paste into their address bar, humans carry extra copies of the salivary amylase gene, allowing us to break down starch much more effectively than apes. Our diet has since changed accordingly. So, if Luskin is to be believed, a whole species gained an almost entirely new food source based on "no new information?"

The usage of "information" by the IDiots is most charitably described as "inchoate", but *we* have to know better, just so we can rip them a new one.

What exactly IS their usage? I've never seen them given a specific, quantifiable definition that would have any scientific meaning whatsoever. I think discussions of "information" with them are futile without a working, agreed upon, objective definition. It's a disservice to even grant the ID proponents any credibility on this subject by referring to information as if they have defined it. They haven't. Any rebuttal to their articles should point this out every time.
One should view this as an ignostic does the existence of god. Any discussion is pointless until a definition is provided otherwise you might as well argue about the color of Saturday.
Because the IDiots proposed the inclusion of "information" into the debate I want a specific definition of it with a step-by-step method for determining it. Tells us how much "information" does a litter of water have and how is that measured value different from those for a litter of carbon dioxide or a litter of oil or a litter of DNA. When done, then will talk.

Argh, damn US language spellcheck in my previous post auto-changed litre to litter. Please substitute as appropriate. Or not, it still kinda works. I suspect everyone here get it; it's the ID proponents that will have difficulty with that "information" :P

@ Graculus #22

Copying a pre-existing gene does create new information ... but it's just a small amount.

It's exactly double the information of the original gene. If you are using Shannon, which you have to because there is no other wat to do it.

I disagree; if the second copy is constrained to be the same as the first, the pair does not contain exactly twice the amount of information as one copy alone. For that to be true, the second copy would have to be free of the constraint to be the same as the first.

It's mutations that introduce the new information...

you might as well argue about the color of Saturday

I'll bet the synesthetes know what colour it is.

But can those liberal athiest scientists tell us WHICH copy? Bet they can't.

Of course not. It's random. :-)

Both of them and none of them, at the same time.

This is the reason why I love reading the stuff over here at scienceblogs. Not only are (many) of them lots of fun, they're also occasionally highly informational in an informal sense.

And that's from an engineer in the making.

Uh, not an english major as you can see.
s/informal sense/informal way

you might as well argue about the color of Saturday

I'll bet the synesthetes know what colour it is.

I bet they don't (particularly if it's a Sunday :P).

What exactly IS their usage? I've never seen them given a specific, quantifiable definition that would have any scientific meaning whatsoever.

Well, of course not. 1. They (and I specifically include the Isaac Newton of Information Theory) aren't smart enough to do any such thing. 2. All they want from "information" anyway is to be a handy-dandy set of movable goalposts.

In Shannon theory the mutations reduce information. "Information" is used in a very specific way that has little to do with the way we usually use the word.

In Shannon, "cat", "dog" and "sxt" all contain the exact same amount of information, that is, they are all three letters long. Meaning (function) has absolutely no place in Shannon. In Shannon if you change "sxt" to "set" you have reduced the amount of information because it is not exactly the same as the original (even if the change increases functionality/meaning), Shannon is completely self-referential and devoid of any metric for functionality.

Genes do not work like this at all, which is why Shannon shouldn't be used for this purpose.

Egnor's little witticism: "[G]ene duplication is, presumably, not to be taken too seriously. If you count copies as new information, you must have a hard time with plagiarism in your classes. All that the miscreant students would have to say is 'It's just like gene duplication. Plagiarism is new information- you said so on your blog!'"

This has got to be one of the most contemptible strawmen ever created in the history of science. And Luskin is quoting it??
To try to reason on their level, you would not mark up a book from the university library, for fear of negative consequences, like fines, etc. But if a student turned in a paper which quoted verbatim the book in the library, you would certainly feel free to mark it up all over the place. Hence, the plagerized copy, although it contains no new information, would almost certainly serve as a vehicle for the production of further information (most likely of a kind extremely pertinent to the student's future.)
Exactly the same process goes on in the genome. So what's the problem?

@ 58

also:

3) Specifying an actual definition of the kind of "information" they're talking about would immediately lead to the conclusion that no magic is required in creating it. That's why the "Isaac Newton of information theory" likes his definition fuzzy...

If a process produced a massive increase in the information content of the genome in a biologically functional way...then we'd have to say that you've found evidence for Intelligent Design. Who wants to bet this comes back to haunt PZ (I even nicely quote-mined it for you).

PZ:

Copying a pre-existing gene does create new information ... but it's just a small amount.

Well, the amount depends on the size of the region being duplicated. In fact, it's exactly equal to the size of the original region. Duh.

Jeffrey Shallit:

If you use Kolmogorov complexity

Is that an appropriate metric? Polymerases aren't Universal Turing Machines; there are a lot of computations implied by KC that they just can't do. However, they do have a sort of analog to a duplication command, which is simply to up-regulate the gene twofold. It's difficult to say what amount of information (in base pairs) it would take to do that, but I suppose that number can be considered the "true" amount of information added. It's certainly greater than zero. And regardless, at least you have a metric in the first place. Luskin has... "design?"

JD:

He's not a biologist. He holds no degrees in the life sciences. Only the one publication on a completely unrelated field? So apparently he wasn't even competent in that and has since left to become a lawyer. As a lawyer, even Luskin knows he couldn't testify as an expert on these matters.

This line of ad hominem argument is absolutely valid, but makes me nervous. At least in America, there's a large backlash against scientists, and really experts of any kind, as a cloistered bunch of mandarins who won't let anyone into the club unless they toe the party line. In some demographics, Luskin would gain credibility if you shouted that he's not a scientist.

In Shannon, "cat", "dog" and "sxt" all contain the exact same amount of information, that is, they are all three letters long. Meaning (function) has absolutely no place in Shannon. In Shannon if you change "sxt" to "set" you have reduced the amount of information because it is not exactly the same as the original (even if the change increases functionality/meaning), Shannon is completely self-referential and devoid of any metric for functionality.
Genes do not work like this at all, which is why Shannon shouldn't be used for this purpose.

I respectfully disagree with your opening quote... the words you quote do not contain the same amount of information, assuming that the first two are constrained to be words in the English language (which is the point I was making originally).

If, in the third "word", the constraint to be a valid word is removed, whereas the first two must be valid words, it has a greater information content. It has nothing to do with meaning, but with whether each letter is an independent observation of some stochastic process, and the nature of the distribution function.

Back to my original point: two copies of the same thing, constrained to be identical, do not contain exactly twice the amount of information as one copy...

If a process produced a massive increase in the information content of the genome in a biologically functional way...then we'd have to say that you've found evidence for Intelligent Design.
Who wants to bet this comes back to haunt PZ (I even nicely quote-mined it for you).

Alan, I noticed that too and it's exactly why I think it's a mistake to ever grant, or least give the impression, that ID has a working definition of "information". I don't know what "the information content of the genome" even is, let alone how it is measured and determined.

PZ:

Copying a pre-existing gene does create new information ... but it's just a small amount.

Well, the amount depends on the size of the region being duplicated. In fact, it's exactly equal to the size of the original region. Duh.

No, PZ has it right. The amount of new information is not equal to that contained in the region being duplicated. If you select some region at random, to add information equal to its information content would require you to add another region of the same size with its contents selected independently at random, with the same distributions and interdependencies as the original.

Because the IDiots proposed the inclusion of "information" into the debate I want a specific definition of it with a step-by-step method for determining it. Tells us how much "information" does a litter of water have and how is that measured value different from those for a litter of carbon dioxide or a litter of oil or a litter of DNA. When done, then will talk.

I quite agree. Except that ... well, let's get to freshman chemistry ... their concept of information: is it an extensive or an intensive property? Among other basic questions to ask about it.

Brain Hertz:

The amount of new information is not equal to that contained in the region being duplicated.

It depends on your definition of information - perhaps "complexity" is a better term, or just "content." A duplication event does increase the content of the genome, in base pairs, by the size of the duplicated region. But then it's redundant content. What do do?

As one commenter suggested, Kolmogorov complexity could be a good start. (You might have seen that if you had read the rest of my post before responding to it.)

"Mostly accurate rhetorical description, but what does the rebuttal gain by the name-calling epithet "pipsqueak"?"
(Bob Lane, commenter #11)

It's what you get when you mutate "Casey Luskin"....

It depends on your definition of information - perhaps "complexity" is a better term, or just "content." A duplication event does increase the content of the genome, in base pairs, by the size of the duplicated region. But then it's redundant content. What do do?
As one commenter suggested, Kolmogorov complexity could be a good start. (You might have seen that if you had read the rest of my post before responding to it.)

I did indeed read the rest of your post before responding.

This is hardly a situation which has not been considered before.

Yes, you could use KC, or Shannon. Whichever you use doesn't change the answer, though. If you simply duplicate a region, the effect is to increase the complexity or information, but most certainly not to double it (in either case), which was my point. The amount of data clearly doubles, of course.

A picture of Casey Luskin

Here's an easy way to boil down the whole duplication isn't new information.

If I duplicate the number 1, but no new information is gained, then 11=1. (This also means that I would very much like to borrow $11 dollars from you.)

Bulman,
the information introduced by duplication isn't zero, but it isn't as much as double.

If I have random variables x1, x2,.... representing independent observations of the same distribution, the sequence:

x1, x2, x3, x1, x2, x3

does not contain twice as much information as the sequence:

x1, x2, x3

an increase in the genetic information in the scenario being discussed is only in the fact of there being one additional copy being made as compared to 2 or three.

Actually, in such a short example it *does* contain twice as much information: the information required to explain what duplication is would be much longer than the information required to simply double the sequence. Sometimes the best compression is no compression - not just because there is no pattern, but also potentially because the pattern doesn't permit savings greater than the cost of using it.

We're used to cheating on the complexity of operations like repetition because we come with brains pre-loaded with certain operations. But in any rigorous complexity metric you have to include the decoding procedure as part of the complexity, which means you have to get into some pretty long sequences before their complexity is genuinely less than their length.

Slam Dunk. Nice PZ.

Also, spelling errors are less common when you're dealing with a "language" of four "letters" in which every combination produces a coherent codon "word" (which may or may not make sense in the context of the gene "sentence"), than in a language of twenty-six letters in which most combinations are not coherent words.

On a surface level, the language analogy is easy to understand, but the similarities break down after a bit.

Anyway, real scientific fact isn't determined by who's got the most persuasive analogy. Sorry, Luskin. Get some evidence, please. Then you can can sit at the big table with the grown-ups.

The amount of data clearly doubles, of course.

In Shannon "information" and "data" are synonymous to a large degree.

Yes, you could use KC, or Shannon. Whichever you use doesn't change the answer, though.

Oh yes if does matter, because both of these use very strict and very particular definitions for "information" and "complexity". In fact, another name for Kolmogorov Complexity is "algorithmic complexity". Doubling can only add, at most, one instruction to the algorithm that describes it.

Anyways, you can't just go around re-defining terms, that's what the IDiots do with this stuff. You don't randomly call electron bonding "gravitation", do you? Shannon information and KC is very specific in what it measures, and doesn't match what the colloquial meanings are.

The amount of new information is not equal to that contained in the region being duplicated. If you select some region at random, to add information equal to its information content would require you to add another region of the same size with its contents selected independently at random, with the same distributions and interdependencies as the original.

But that wouldn't be Shannon information (if it's equal size then the amount of information is equal, no matter what the content), and you don't have a metric for *that*.

It seems that these guys don't want to admit (or just don't get) that duplication by itself is not what allows for new genes and ultimately new protein synthesis - it is presumably duplication followed by mutation in one of the duplicates, no? (Well, I suppose there might be an "end effect" too.)

The amount of data clearly doubles, of course.

In Shannon "information" and "data" are synonymous to a large degree.

I disagree. Very much so. The amount of data sets an upper bound on the amount of information, but not the lower bound.

Yes, you could use KC, or Shannon. Whichever you use doesn't change the answer, though.

Oh yes if does matter, because both of these use very strict and very particular definitions for "information" and "complexity". In fact, another name for Kolmogorov Complexity is "algorithmic complexity". Doubling can only add, at most, one instruction to the algorithm that describes it.

Anyways, you can't just go around re-defining terms, that's what the IDiots do with this stuff. You don't randomly call electron bonding "gravitation", do you? Shannon information and KC is very specific in what it measures, and doesn't match what the colloquial meanings are.

Excuse me? Where above did I state that information and complexity were the same thing?

My statement is pretty clear, and I stand by it: namely, if you choose for some reason to substitute "complexity" for "information" (which wasn't me, btw, I was responding to the substitution), it doesn't change the answer to that particular question. I wasn't suggesting some kind of equivalence. In which matter, you seem to be agreeing with me in your first paragraph above, by the way: my statement was that irrespective of whether you look at information or complexity, simply copying the message doesn't double the information (or complexity).

As for the definitions, I'm sticking with Shannon's definition of information (which is where we started). I certainly agree that you can't mess with the definitions as Billy Dembski likes to do (I commented on this earlier in the thread, in fact).

But that wouldn't be Shannon information (if it's equal size then the amount of information is equal, no matter what the content), and you don't have a metric for *that*.

I'm not sure what you mean. If you aren't trying to measure information by Shannon's metric, what are you using?

In any case, I very much disagree that you can measure the information by looking at the size of the message. You need to know the properties of the process that generated it. If not, what is the procedure you would use to calculate the information contained in the data?

ugh... thought I'd fixed all the blockquote errors in preview... I guess not. The fifth and sixth paragraphs above should be blockquoted.

Epistaxis wrote: "This line of ad hominem argument is absolutely valid, but makes me nervous. At least in America, there's a large backlash against scientists, and really experts of any kind, as a cloistered bunch of mandarins who won't let anyone into the club unless they toe the party line. In some demographics, Luskin would gain credibility if you shouted that he's not a scientist."

When I meet someone who has a problem with that "fallacy", my response is always the same: Would you let a PhD in engineering operate on you?

Of course, the answer is always no and apparently, that's because the engineer is not a doctor. And then they get it.

"What I need is a step-by-step mutation account of how one sentence evolved into the other wherein the sentence remains functional - i.e., it has comprehensible English meaning - at all stages of its evolution."

I know very little about biology, but isn't this a logically flawed "challenge"? My (limited) understanding is that evolution occurs in living things through reproduction. I further understand (more confidently, I must add) that language, while certainly "alive" in a figurative sense, doesn't produce offspring through a sexual (or even an asexual) process...so aren't we comparing apples and oranges here?

amount of information in a duplication ...
in strings of 0 and 1, compare

0
to
00

amount of information in a duplication ...
in strings of 0 and 1, compare

0
to
00

And what is your answer? As I mentioned above, I assert that it isn't possible to determine the quantity of information in a string just by looking at it, without knowledge of the process which generated it...

I guess I could clarify what I mean a little:

You told me that the process by which the final result was generated was to select a first bit (I'll assume with equal probability) and then duplicate it. Thus, you have a process which generates one of two possible stings, namely "00" and "11". As such, the string (assuming a binary representation) consists of 2 bits of data, but it only encodes one bit of information.

Note also that I took a shortcut by assuming equally probably "0" or "1" as the first bit. If this is not true, the process actually generates less than one bit of information (but still 2 bits of data).

Ironically, Luskin's centre is called the IDEA or the Intelligent Design and Evolution Awareness centre. I say ironic as I see no actual Awareness of what Evolution is and definitely no signs of genuine intelligence from him or his fellow IDiots. And to those complaining or concerned about us mean atheists using such terms as IDiots, it is not an ad hominem when it is an accurate description culled from an examination of their own words and actions.

#65...Yeah, all this talk about algorithmic complexity, etc., is really fascinating. But I don't see how it's relevant to biology right now. It's the IDers who propose some sort of magical barrier to the creation of new information, even via very slight improvements, not working biologists.

The whole point of gene duplication is that you free up some DNA that already has some interesting code to mutate a bit without killing the organism. That's something that happens over time, via successive steps...I don't see the point of considering how much new info is created upon the act of duplication (though I imagine it might be fun, if I knew more about information theory).

I think we're just talking past each, rather than disagreeing.

I disagree. Very much so. The amount of data sets an upper bound on the amount of information, but not the lower bound.

Right, because of noise. There is no noise in genetics.

But I was looking at it as having two original messages from the same source, one of which was twice the size of the other... not the results after transmission.

In any case, I very much disagree that you can measure the information by looking at the size of the message. You need to know the properties of the process that generated it.

It would be a given that the same process was involved. Or, to make it more science-y ;-) , we've controlled for all other variables.

I'm not sure what you mean. If you aren't trying to measure information by Shannon's metric, what are you using?

No, I was with you. There is no objective metric for "information" other than Shannon, everything else is subjective.

All quibbling aside, the main thing is that you can't use Shannon or KC here AT ALL. The glaringly, blindingly obvious reason is that Shannon is self-referential, and KC doesn't tell you anything about function.

Luskin is remarkably stupid. His puerile post shows not only a lack of insight, but also also that he is totally unacquainted with the literature. His "challenge" is readily addressed. What? Everyone has noticed this? The only significance of his antics, then, is that he seriously misleads uninformed persons who happen to read his work.

I think we're just talking past each, rather than disagreeing.

I think that's mostly true. I think we have some outstanding points of disagreement (or lack of mutual understanding), though; I'll try to clear that up.

Right, because of noise. There is no noise in genetics.
But I was looking at it as having two original messages from the same source, one of which was twice the size of the other... not the results after transmission.

Well, not so much because of noise but because of constraints on the sequences. The information generated by a source per symbol is at a maximum and equal to the amount of data when each of the possible values of the alphabet is equally likely and independent of any other symbol in the sequence. It is at its lowest (zero) when each symbol has a deterministic value. It's somewhere in between if some symbols are more likely than others (including conditionally on other symbols in the sequence).

The point about I was making about sources is that if you take a message (or genetic sequence) and replicate it exactly, the replication function doesn't create information (aside from if you consider the replication occuring or not occuring is itself a probabilistic event). That is because the copy is constrained to be identical to the first, and doesn't represent an independent message from the same source. Or, given the existence of the original, each symbol in the copy has a deterministic value.

On the other hand, if the replication is imperfect and introduces mutations (or corruption in the message, depending on your point of view) the replication is an information generating source. It's not very much, because the probability of mutation is presumably very low, but that's what you'd expect.

Luskin is writing a third part...can't wait. I also noticed the quote-mine:
http://sciencethegapfiller.blogspot.com/2007/09/luskin-quote-mine.html

As well as a stunning misunderstanding of Shannon Information on Luskin's part, which Dawkins explained in plain English.

http://sciencethegapfiller.blogspot.com/2007/09/methinksluskinisconfuse…

The point about I was making about sources is that if you take a message (or genetic sequence) and replicate it exactly, the replication function doesn't create information

Yeah, Shannon only deals in source message compared to recieved message, so it is logically impossible to increase information in Shannon.

But if you were comparing two source messages (same process) for "bandwidth needed to transmit", you could say that it was doubled.

What I was trying to say is that no matter how you futzed around with Shannon IT, you can only have two answers.. one is that there is no information gain (your take, which is correct), or that the content is exactly double (by comparing two different source messages, which isn't strict application of Shannon). There's nothing in between.

There's nothing in between

... apart from the biological reality of course! It's definitely an increase in information but the by-how-much aspect depends...

I'm sorry if this is something that's already been discussed in the comments. Also, I'm not a biologist, so I can't claim to understand genetics too much.

Couldn't Luskin's "Information Challenge" be solved using something like gene duplication? What if the whole word sequence underwent a duplication event, and the duplicated part mutated while the first part stayed intact?

METHINKSDAWKINSDOTHPROTESTTOOMUCHMETHINKSDAWKINSDOTHPROTESTTOOMUCH
... (millions of years of mutations) ...
METHINKSDAWKINSDOTHPROTESTTOOMUCHGWYFLQWOPCKDFQGRMYRQKPLZRPDJFXBIO
... (millions more years of mutations) ...
METHINKSDAWKINSDOTHPROTESTTOOMUCHBUTIMSUREDAWKINSBELIEVESHEISRIGHT

Graculus,
it seems we still disagree. I'll skip to the most important part:

But if you were comparing two source messages (same process) for "bandwidth needed to transmit", you could say that it was doubled.

No you couldn't. Shannon's whole point was about capacity of channels (which I assume is what you mean by "bandwidth") in terms of information not data. That's precisely why these two are not equivalent.

What I was trying to say is that no matter how you futzed around with Shannon IT, you can only have two answers.. one is that there is no information gain (your take, which is correct), or that the content is exactly double (by comparing two different source messages, which isn't strict application of Shannon). There's nothing in between.

Absolutely not true, as I went on to say above. There is no "futzing around" going on here, and neither are these the only two options. I also don't know what you mean by a non-strict application of Shannon...

The most important one here is the replication with errors (ie mutations), in which case the result is that information is generated, but it is not double what you had to start with.

By the way, JM Ridlon @ #91 posted a link to another blog post (the second link) which describes this in a rebuttal to Luskin.

I solved Luskins challenge in 43 mutations.

Results here along with an explanation of who would understand each mutation.

#96, that is a most impressive display of applied geekery.

I'll take that as a compliment.

What can I say, I like word games, etc. and that seemed like a fun one.

That was great. But you know, if an ID'er saw it, they'd play the "see, it has a designer!" card. Wouldn't expect them to get the point...

OK.

Appropriate measure, eh? Great, and what are the units of this measure?

The Timecube? :->

But can those liberal athiest scientists tell us WHICH copy? Bet they can't.

Of course not. It's random. :-)

Luskin on gene duplication

More like this

Friday Cephalopod: I succumb to peer pressure and will mention Octopolis

Friday Cephalopod: we all float down here

Friday Cephalopod: Reflecting my current mood

Friday Cephalopod: Sinking blue

Friday Cephalopod: Undead Squid Penis

Messier Monday: A Cluster Beyond Our Galaxy, M79

The History of Pretty Much Everything

The Superbowl of STEM was one for the Books! The 3rd USA Science & Engineering Festival Reaches over 325,000!