Peptides publishes a clunker

I've got my hands on a strange paper by D Kanduc: "Protein information content resides in rare peptide segments". Here's the abstract.

Discovering the informational rule(s) underlying structure-function relationships in the protein language is at the core of biology. Current theories have proven inadequate to explain the origins of biological information such as that found in nucleotide and amino acid sequences; an 'intelligent design' is now a popular way to explain the information produced in biological systems. Here, we demonstrate that the information content of an amino acid motif correlates with the motif rarity. A structured analysis of the scientific literature supports the theory that rare pentapeptide words have higher significance than more common pentapeptides in biological cell 'talk'. This study expands on our previous research showing that the immunological information contained in an amino acid sequence is inversely related to the sequence frequency in the host proteome.

What? This is an intelligent design paper? How interesting. Unfortunately, the abstract is wrong, and 'intelligent design' is not a popular way to explain information in biological systems, and I read through the whole thing, and missed the part where it actually supports ID.

Here's what the paper actually does: it dissects a sample protein and asks about the frequency of its components in the proteome. It looks specifically at calmodulin (CaM), an important and highly conserved protein that is involved in all kinds of developmental and physiological interactions. The rather arbitrary unit the protein is broken down into is 5 amino acid chunks, or pentapeptides, and each pentapeptide sequence is searched for in genes other than CaM. If this is the initial sequence of CaM,

MADQLTEE…

Then what Kanduc does is search the proteome for MADQL, ADQLT, DQLTE, etc., and count the number of times each appears. Rare pentapeptides are equated with high information content, and common ones are assigned low information content. Some pentapeptides, in his analysis, are found only in CaM, while others are found multiple times, with an average of 12 occurrences. This is supposed to be significant.

It's also where he loses me. If you search a completely random string of amino acids for an arbitrary pentapeptide, it should turn up, on average, once in every 3,200,000 amino acids. If you search a long enough chunk of amino acid sequence, one that's long enough to generate on average 12 hits, what you'd expect to see is a bell-shaped distribution — some pentapeptides may appear only once, while others appear dozens of times, just by chance. And that is what Kanduc sees. That some pentapeptides are unique to CaM is perhaps not too surprising, especially when you consider that the proteome is not a random sequence at all, but the product of frequent gene duplications and is also refined by selection.

So far, this idea that some pentapeptides will be rare and others common, is utterly uninteresting and unsurprising. I would have liked to have seen some consideration of the null hypothesis, that the distribution is due to chance alone, but that seems to be totally lacking. If I'd been reviewing the paper, I would have sent it back with a request for revisions to consider that possibility.

However, Kanduc does propose something that actually is interesting: that the rare pentapeptide sequences in specific genes also correlate with regions that have important functional roles.

Using the CaM features, attributes and annotations reported at www.uniprot.org/uniprot/P62158, we find that modification sites, structural beta strand motifs, functional domains, and epitopic determinants are confined primarily to areas of low similarity with the human proteome.

Now that's kind of cool, if true. It's also a bit unsurprising. He does examine the length of the CaM protein and show that rare pentapeptide regions are also sites for for acetylation, ubiquitylation, and phosphorylation, and also at the calcium binding site, for instance; but these are functional regions of the protein where one would expect some selection for specific properties. We get a different analysis, in which naturally occurring pentapeptide fragments that are known to have significant biological activity are searched for in the human proteome, and found to be fairly rare. Again, this might be an expected result explained by selection — after all, a sequence that can trigger apoptosis might be expected to be confined by selection to a limited range of sites — and don't seem to me to require postulating an intelligent designer.

As a paper that hints at some possible functional correlations in the proteome, it's mildly diverting. It's weak in that it doesn't address the null hypothesis very well — I get the impression the author is more interested in fishing for correlations than in actually testing his hypothesis. Where it starts triggering alarm bells, though, is the shoutout to creationists. Kanduc says this about CaM:

…the CaM sequence is characterised by both specificity and complexity (what information theorists call 'specified complexity'); in other words, it has 'information content'.

Uh-oh. "Specified complexity" is a meaningless phrase; the creationists have not defined how to measure "specification". In this case, Kanduc hasn't either, and his criterion for calling it "specified complexity" is that CaM has various functional domains, which is kind of expected for a protein that has functions. I find it interesting, too, that he doesn't provide a citation for his claim — Dembski doesn't get an acknowledgment. Probably because it would be a too-obvious hint about where in looney-land this idea is coming from, and because Dembski doesn't bother to explain how to calculate "specified complexity" either.

Also, there's something suspicious about the phrasing there — it seems to be straight out of Meyer 2000:

Systems that are characterized by both specificity and complexity
(what information theorists call "specified complexity") have
"information content".

Hmmmmm. (Thanks to Blake Stacey for picking up on that identity.)

Another problem with the paper is the conclusion, which is some unholy amalgam of a dog's breakfast and a word salad, and either way is grossly unappetizing.

Researchers in the fields of biology and immunology need to define objective informational entities and reductionist basic laws that are valid everywhere and for everything. As new objects and scientific laws are absorbed into experimental protocols and reports, abstract terms such as "sense", "edit", and "attack" as well as old dogmas such as the self/non-self dichotomy will become obsolete in favour of more intelligible and concrete theories and biological activities. This process will enable the effective translational application of science to medicine.

What the heck does that mean? What does it have to do with the rest of the paper? Again, if I'd been reviewing it, that would have gone back with a recommendation to delete the gobbledygook and write a conclusion that actually makes sense in the light of the rest of the paper.

What we have here is yet another case of poor reviewing and editing. There is a germ of an interesting observation in the work that the author fails to examine critically and convincingly, but the main intent seems to be to inject the words "intelligent design" into a reviewed scientific paper (while failing to justify why that is a useful hypothesis) and for the author to ride some obscure immunological hobbyhorse which is also not addressed by any of the data. It's remarkably sloppy work that should have been sent back for extensive revision, rather than being published as is.

I do notice that it was received at Peptides on 20 January, and then bounced back and accepted after what must have been only minor revisions only two weeks later. The journal is commendably fast in its turnaround, but this looks like a case where haste just churned up the garbage a bit more.


Kanduc D. Protein information content resides in rare peptide segments, Peptides (2008), doi:10.1016/j.peptides.2010.02.003

More like this

A while ago, I wrote about Dembski's definition of specified complexity, arguing that it was a non-sensical pile of rubbish, because of the fact that "specified complexity" likes to present itself as being a combination of two distinct concepts: specification and complexity. In various places,…
The Discovery Institute doesn't understand the protein folding problem. I mean that literally: they don't understand the problem. Scientists don't know the answer, but they have a clear understanding of the problem. PNAS published a "Perspective" article, "The Nature of Protein Folding Pathways,"…
Rabbi Moshe Averick asks, "Seriously, Aren't Atheists Embarrassed by P.Z. Myers?" Seriously, aren't you? What's the matter with you people? What prompts his outrage is his discovery of a lecture I gave some time back on the complexity argument from intelligent design creationists. He is appalled at…
Read Part One and Part Two. Stephen Meyer was next up. Strobel and Richards played their parts well, but, let's face it, the conference thus far had mostly been amateur hour. Strobel stepped in it every time he mentioned something vaguely scientific, but he's not exactly thrust forward as one of…

the paper is the conclusion, which is some unholy amalgam of a dog's breakfast and a word salad, and either way is grossly unappetizing. - PZ

My dog will eat (and keep down) almost anything, but I wouldn't give her that quote for breakfast, and if she picked it up somewhere while browsing the literature, I'd keep her in the garage until she'd sicked it up again!

By Knockgoats (not verified) on 17 Feb 2010 #permalink

Note that this is classified by the journal as a review, not a research paper. In my experience - and PZ and other biologists are welcome to chime in - review articles get cursory peer review relative to research papers. If it is an invited review, the author's expertise is assumed a priori and sent right to the copy editing department.

This is how we end up with nonsense like the Warda and Han "mighty creator" paper in Proteomics, as well as Stephen Meyer's review in Proceedings of the Biological Society of Washington.

I don't understand what on earth he's trying to get at.

I also think that there are some incredibly strange assumptions about structure/function. For a start, the defined functional regions of proteins are relatively arbitrary - he seems to classify structural motifs and sites of post-translational modification as functional, whilst ignoring the fact that the rest of polypeptide is probably also required for function. This is particularly so in the case of enzymes, where function and conformation are intimately related.

Notably, his definition of functional is necessarily selective for small, specific motifs (e.g. sites of post-translational modification). These must be 'rare' peptide sequences, simply because of the biological requirements that they are so in order to preserve function - these are often recognition motifs for other enzymes, cofactors, or substrates.

Databases like those on PROSITE already carry out motif searches bases on functional motifs. The only novel claim here appears to be that pentapeptides are the "words" for some abstract polypeptide functional "phrasing", and I call bullshit on that, since functional relationships can be spread over much larger or smaller amino acid chains.

It looks a lot like Biblecode proteomics, to me. Maybe I've entirely missed the point.

By Bernard Bumner (not verified) on 17 Feb 2010 #permalink

Wait, what? It's a review? There is almost no synthesis of existing work in the field, and it's entirely a discussion of one specific set of analyses carried out by the author on proteomic data sets.

There is nothing in my copy to indicate it was regarded as a review, and the content does not smell like a review paper at all.

I read through the whole thing, and missed the part where it actually supports ID.

How does that differ from any other ID paper?

The dates suggest to me that the first time the paper was submitted it got a "reject, encourage resubmission" response, so that it would be treated as a new submission, and have a short time to acceptance. i.e. it's all about playing the numbers game. Something that seems to have come in over the last 15 years, and really pisses me off.

I wonder if the ID comments were in the original manuscript or not. Either way, the identity of the handling editor might be interesting - any familiar names on the board? I don't recognise any.

This is how we end up with nonsense like the Warda and Han "mighty creator" paper in Proteomics, as well as Stephen Meyer's review in Proceedings of the Biological Society of Washington.

That's certainly not the case for Meyer's article: that was reviewed 'properly', but the identity of the reviewers might be rather interesting.

In my experience, reviews go through the amount of refereeing as any other paper.

admittedly knowing very little about biology other than my bent for amateur gynecology.

From the snippets presented here is a notice of congregated interesting bits. Now where he see's this collection are they in special purpose bits or general purpose bits? If special purpose can anything be extracted about what those bits bring to the table?

By broboxley (not verified) on 17 Feb 2010 #permalink

My eyes crossed just reading the quotes. I commend you for sludging through the whole paper. What gobbledygook.

Here, we demonstrate that the information content of an amino acid motif correlates with the motif rarity. A structured analysis of the scientific literature supports the theory that rare pentapeptide words have higher significance than more common pentapeptides in biological cell ‘talk’. This study expands on our previous research showing that the immunological information contained in an amino acid sequence is inversely related to the sequence frequency in the host proteome.

The author clearly considers this to be a research paper.

By Bernard Bumner (not verified) on 17 Feb 2010 #permalink

Here, we demonstrate that the information content of an amino acid motif correlates with the motif rarity.

Do they mention which definition of "information content" they will be using before they do that?

By Reginald Selkirk (not verified) on 17 Feb 2010 #permalink

They're always looking for the magic word or number, aren't they?

This looks a lot like Biblical codes. For sure, they might find something interesting, the trouble is that they want some magical being to be responsible for their findings.

More straining to determine that magic exists, rather than looking for meaningful causes.

Glen D
http://tinyurl.com/mxaa3p

By Glen Davidson (not verified) on 17 Feb 2010 #permalink

Cue Discovery Institute to "hail breakthrough intelligent design-friendly, peer-reviewed research article in a prestigious peer-reviewed science journal"

Oh, and it's "peer-reviewed."

By real peers.

By bill.farrell (not verified) on 17 Feb 2010 #permalink

#4 PZ,

Agreed, but on the PDF it's stamped "Review" in the upper left hand corner, right under the Elsevier logo.

Sounds like biobabble to me.

There's far too much science on this blog and not enough politics. ;)

By https://login… (not verified) on 17 Feb 2010 #permalink

Well, I did not read the paper, only the abstract provided in the original post, but I was getting a feeling, this is someone trying to pull a "Sokal" (or is this not possible?)

By mickkelly (not verified) on 17 Feb 2010 #permalink

BTW, some of these were bound to show up. Remember Expelled had at least one person in the shadows claiming that he used ID in his research.

Utter BS, almost certainly. The odds are strongly that he was doing science where it matters, and crediting woo for the fact that, gee, patterns can be found in an organism's sets of information.

Who knows, maybe this was the guy in Expelled.

Glen D
http://tinyurl.com/mxaa3p

By Glen Davidson (not verified) on 17 Feb 2010 #permalink
Here, we demonstrate that the information content of an amino acid motif correlates with the motif rarity.

Do they mention which definition of "information content" they will be using before they do that?

Sure! If you read the paper, you'll find they infer information content from motif rarity, thereby clearly demonstrating the correlation between information content and motif rarity.

Do they mention which definition of "information content" they will be using before they do that?

I would guess the usual one, so that statement becomes a tautology and does not bother people who know what the words in it actually mean.

By is.chuckling (not verified) on 17 Feb 2010 #permalink

How do these people jump from "highly conserved functional regions" to "DESIGNED!"?

And what's with the idea of the penta-peptide?

Linus Pauling realised that in an α-helix there were 3.6 amino acid residues per turn. Design? Hardly, it's a physical property based on the chemical structure of the residues.

Thus a pentapeptide can also be explained along the lines of chemical and structurally favourable characteristics. This paper has already found a biased interpretation for easily explained phenomena.

Here, we demonstrate that the information content of an amino acid motif correlates with the motif rarity.

Umm, that's the definition of information. We already know that rare things contain more information than common ones.

Well, it only took the IDers 60 years to catch up with Shannon.

By jeff.satterley (not verified) on 17 Feb 2010 #permalink

Oh, they thought all it needed was PEAR review.

By NewEnglandBob (not verified) on 17 Feb 2010 #permalink

rni.boh #5 wrote:

That's certainly not the case for Meyer's article: that was reviewed 'properly', but the identity of the reviewers might be rather interesting.

Interesting, indeed. From the Biological Society of Washington:

The paper...was published at the discretion of the former editor, Richard v. Sternberg. Contrary to typical editorial practices, the paper was published without review by any associate editor; Sternberg handled the entire review process.

Your experience with review papers is quite different from mine - I wonder how widespread the difference is, or isn't.

I only gave it a cursory glance, but it might be a clever way to promote ID. By including it in the abstract it will come up in standard google searches and no one will have to pay for the entire article (if its not a free journal that is). Consumers (teachers, ID proponents, media for example) will probably not read the entire article. Heck, im sure alot of researchers skim just the abstract and then can use the article to support ID nonsense. This will really give them a boost when arguing I think (e.g., "pennywacker and numbnards, 2010, state that..).

By hornungerous (not verified) on 17 Feb 2010 #permalink

First of all, it came from Italy. What did you expect?

It's not a review. The paper presents a method, albeit a nonsensical one, and results obtained with that method. It's a research paper posing as a review, because it's too weak to pass as primary research.

The author is trying to say something about the proteome based on a sample of one arbitrarily chosen small protein, calmodulin. Then she goes on to say that the method can be extended to a number of other proteins, and gives a list of references to the author's own papers, where this has apparently been done. I'd say Kanduc is simply publishing some random filler text, with tons of citations to her own papers and getting the words 'intelligent design' into the scientific literature as a bonus.

There is no mention of motif discovery used in conjunction with phylogenetic reconstruction to discover conserved and nonconserved functional motifs in protein families, and to predict novel funtional sites. This is an active field of research, with a number of people who actually understand information theory working in it, and the author seems to be entirely unaware of it.

I suspect the Conclusions paragraph was probably ghost-written by Sarah Palin after the author tried to teach her some new words.

By https://me.yah… (not verified) on 17 Feb 2010 #permalink

First of all, it came from Italy. What did you expect?

It's not a review. The paper presents a method, albeit a nonsensical one, and results obtained with that method. It's a research paper posing as a review, because it's too weak to pass as primary research.

The author is trying to say something about the proteome based on a sample of one arbitrarily chosen small protein, calmodulin. Then she goes on to say that the method can be extended to a number of other proteins, and gives a list of references to the author's own papers, where this has apparently been done. I'd say Kanduc is simply publishing some random filler text, with tons of citations to her own papers and getting the words 'intelligent design' into the scientific literature as a bonus.

There is no mention of motif discovery used in conjunction with phylogenetic reconstruction to discover conserved and nonconserved functional motifs in protein families, and to predict novel funtional sites. This is an active field of research, with a number of people who actually understand information theory working in it, and the author seems to be entirely unaware of it.

I suspect the Conclusions paragraph was probably ghost-written by Sarah Palin after the author tried to teach her some new words.

By https://me.yah… (not verified) on 17 Feb 2010 #permalink

A few unrelated points:

1. I'm pretty sure Darja is a feminine name so let's switch to 'she', shall we? It's not like the female sex has anything to be proud of in this paper, but still.

2. It does say 'Review' on the ScienceDirect page but not on PubMed nor on this PDF. It sure doesn't seem like a review to me but maybe Peptides uses a loose definition of 'review'.

3. The 'specified complexity' quote looks an awful lot like plagiarism of Meyer. Someone should probably flag this up. *cough*

4. Kanduc introduces the words ‘sense’, ‘obey’, ‘evaluates’, ‘understands’ as if they are well established in the scientific vernacular (they're not), then, in the conclusion, says 'abstract terms such as "sense", "edit", and "attack" as well as old dogmas such as the self/non-self dichotomy will become obsolete in favour of more intelligible and concrete theories and biological activities.' It's a straw man cupcake with false dichotomy frosting.

5. What on Earth does any of this have to do with 'effective translational application of science to medicine'?

By Karen James (not verified) on 17 Feb 2010 #permalink

Umm, that's the definition of information. We already know that rare things contain more information than common ones.

Which definition of information? Any given amino acid chain of a specific length will contain more, less, or equal amounts of information, depending on how you define it. In this case, equating information to function necessarily means that rarity correlates, because the author is simply making the trivial, and circular, observation that function (information) is a property of functional sequences.

Unless they've carried out analysis using strings of other length, then I'm still at a loss to see why pentapeptides should be considered significant.

...Functionally, words in the amino acid language typically have a size of five units [18], [35], [54], [57] and [64]...

Each of these papers is a study on antigenicity, where pentapeptides are found to constitute the minimum length for antigenic determination. The claim that "...words in the amino acid language typically..." is simply not supported by those references, and I can see no reason why any referee would allow that statement to stand.

The next sentence, stating that, "Pentapeptides appear to be the minimal biological units able to exert a central role in fundamental cell processes such as inhibition/stimulation of cell growth and regulation of hormone activity, transcript expression, enzyme activity, and immune recognition [42]", is a reference to a minireview article by Kanduc (and four other authors, which is a lot for a minireview). The article appears to be simply a list of biologically important pentapeptides (although at least one tetrapeptide* and a hexapeptide are also mentioned).

It strikes me that Kanduc is very, very keen on pentapeptides.

(*The fact that a tetrapeptide is identified would seem to undermine the central tenet of any paper with a title starting, "Pentapeptides appear to be the minimal biological units...")

By Bernard Bumner (not verified) on 17 Feb 2010 #permalink

Another problem with the paper is the conclusion, which is some unholy amalgam of a dog's breakfast and a word salad, and either way is grossly unappetizing.

Day saved.

Do they mention which definition of "information content" they will be using before they do that?

Sure! If you read the paper, you'll find they infer information content from motif rarity, thereby clearly demonstrating the correlation between information content and motif rarity.

There is no "they" there. There's a single author, despite the bizarre use of "we".

Unless…

I suspect the Conclusions paragraph was probably ghost-written by Sarah Palin after the author tried to teach her some new words.

Well, I report, you decide.

By David Marjanović (not verified) on 17 Feb 2010 #permalink

I have to run to an all day meeting in about 10 minutes, and I'll have to come back and read this more fully later, but I couldn't ignore this in the meantime:

Current theories have proven inadequate to explain the origins of biological information...

Say what?

There. Is. No. Mystery.

Never has been. Information is generated by any stochastic process.

And no, I don't buy the "we're using a different definition of information than Shannon". Words have meanings, and you can't just arbitrarily go and choose a different one. If you want to talk about something other than information as understood and used by everybody for ~62 years, you're not talking about "information".

Ok, I'm done for now. Later...

By Brain Hertz (not verified) on 17 Feb 2010 #permalink

P-Zed Myers:

Some pentapeptides, in his analysis, are found only in CaM, while others are found multiple times, with an average of 12 occurrences. This is supposed to be significant.

Somebody should re-do that analysis; the database is online, and before we say if there's even a result here which needs an explanation, we should check the figures. For example, Table 2 on p. 18 says that the pentapeptide VPMLK has no matches, but when I run it, I get an ATP-dependent DNA helicase, a TATA box-binding protein-associated factor for RNA polymerase I and some other stuff. A good proportion of the hits for any particular pentapeptide search I do appear to be redundant (variations on or alternate names for the same protein), but allowing for redundancy should hardly make a non-zero number go down to zero. Not being a molecular biologist, I'm not really at home with the identification codes and classification schemes used in these databases; somebody more familiar with the field than I am should look into it.

Bernard Bumner (#27):

Unless they've carried out analysis using strings of other length, then I'm still at a loss to see why pentapeptides should be considered significant.

Yes. I'm glad somebody else agrees with me on this.

By Blake Stacey (not verified) on 17 Feb 2010 #permalink

Karen James #26 wrote:

2. It does say 'Review' on the ScienceDirect page but not on PubMed nor on this PDF. It sure doesn't seem like a review to me but maybe Peptides uses a loose definition of 'review'.

Ah, now i see the problem. You and PZ are looking at what appears to be an older version of the PDF, which looks like a regular Word document. Following the DOI that PZ provided, I was sent to a PDF that was properly formatted for the journal, and it says "Review" under the Elsevier logo. In addition, PubMed often fails to properly label review articles.

I'm only harping on this point because I'm sick and tired of ID proponents abusing the category of literature review to get published. The editor should be notified.

Hmm. Let me see if I get this right.. Since things like en, an, ts, and es are "common" in human language, those **can't be** intelligently designed, but all the other combinations are, because the are rarer and therefor contain all the "information" in language? Is this what this moron is claiming about DNA?

I just did a blastp search for the protein sequence ELVIS.

I got hundreds and hundreds of exact matches.

I then did a blastp search for the protein sequence YAHWEH.

There were no exact matches, although there were a bunch of off-by-one matches, like YSHWEH or YAHWQH.

Clearly, Elvis is king.

Karen James @ 26 "maybe Peptides uses a loose definition of 'review'".

Clearly, Peptides uses a loose definition of "science". (They should be embarrased.)

I'm not really at home with the identification codes and classification schemes used in these databases; somebody more familiar with the field than I am should look into it.

Or not. The paper is rather obviously total bunk anyway. I took a second look at it and I think PZ was being too kind about it. If someone competent in the field is going to spend any time on this, they should probably write to Peptides and kindly ask them not to sleep at the switch.

By https://me.yah… (not verified) on 17 Feb 2010 #permalink

Too late! Ha ha haaaa! The words "specified complexity" are in a peer reviewed journal. That PROVES that ID is science! Now we can dispense with all this arguing and get down to some REAL ID science, such as...
...
ummm...
...
...
HA HA HAA!! ID is officially science now!!!

By venturefreemcgee (not verified) on 17 Feb 2010 #permalink

@ NewEnglandBob #21

"Oh, they thought all it needed was PEAR review."

You're right.. The whole thing's a bit pear-shaped.

By theshortearedowl (not verified) on 17 Feb 2010 #permalink

Brain Hertz:

And no, I don't buy the "we're using a different definition of information than Shannon". Words have meanings, and you can't just arbitrarily go and choose a different one.

For most of the paper, "information" seems roughly synonymous with "function". It's the rare pentapeptides which (supposedly) have biological functions (as found by searching for them in PubMed), so the author claims that "information" is in those rare pentapeptides. But then, sometimes this sense of "information" is conflated with a probabilistic one. For example, the conclusion says, "Zipf's law states that rare words have greater information content", but Zipf's law is just the statement that word frequencies tend to follow a power-law distribution. Rare words, by definition, have low frequency; the probability of seeing one is small, so minus the logarithm of that probability is large.

By Blake Stacey (not verified) on 17 Feb 2010 #permalink

I suppose the nominal publication date of this paper isn't April 1.

The question for those who follow corporate behavior is whether Reed Elsevier has decided that it can milk its huge stable of science journals by cutting back on the quality of review and fattening its bottom line for now. Let's hope this is a one-time piece of sloppiness, not a corporate decision to cut quality.

By Free Lunch (not verified) on 17 Feb 2010 #permalink

I wonder if this kind of stuff is typical for this author.

Incidentally...

Functional bits of proteins are not randomly found in structural/non-binding bits of other proteins. Could this be because that would make them... functional?

And...

If we define rare words as having more information, then it turns out that more information is found in rare words!

EPIC FAIL

By theshortearedowl (not verified) on 17 Feb 2010 #permalink

I wonder, what has John Davison been doing lately.

Running socks?

Kidding, really. I think there'd have to be a rant against Darwinists somewhere in it if Davison wrote it.

Glen D
http://tinyurl.com/mxaa3p

By Glen Davidson (not verified) on 17 Feb 2010 #permalink

Davison still emails me regularly, telling me that I must read his latest refutation of Darwinism. I don't bother.

Platypus @33--That made my day!

By cactusren (not verified) on 17 Feb 2010 #permalink

Davison still emails me regularly, telling me that I must read his latest refutation of Darwinism. I don't bother.

But it will all be recognized as the truth someday after he is dead. "Dohn Javison" will be on every biologist's lips...

Glen D
http://tinyurl.com/mxaa3p

By Glen Davidson (not verified) on 17 Feb 2010 #permalink

Dear All,

Let me clarify, please, that the paper in object is no "ID paper" at all. On the contrary. Please, read under the introduction. It is literally written as follows:

"As it is clearly absurd to ascribe anthropomorphic behaviors and intelligent qualities to molecules and cells, we must search for the logic of molecular events in the informational features of the molecules themselves. The ‘sensing’ and ‘rationality’ of enzymatic reactions and immune attacks cannot reside in anything other than the enzymes, antibodies, and antigens, i.e., the proteins that participate in biological and immunological events."

By writing "specified complexity" in the not yet edited proofs, it was just intended to mean that specificity and complexity are properties intrinsic to the amino acid sequences (and of no derivance from an external agency, of course).

Sure this mis-interpretation of the paper comes from my having been too synthetic. Please, waste a bit of your time in going through the paper: my aims will be of evidence.

Thanks & Regards
Darja Kanduc

By Darja Kanduc (not verified) on 17 Feb 2010 #permalink

Thanks for the clarification, Darja. It apparently takes care of a lot of worries we had, possibly some due to language issues.

Glen D
http://tinyurl.com/mxaa3p

By Glen Davidson (not verified) on 17 Feb 2010 #permalink

No problem, Glen. And, of course, I'll be glad to clarify any other issue that I may have mis-represented or that may be of possible interest. Darja

By Darja Kanduc (not verified) on 17 Feb 2010 #permalink

Darja Kanduc @46 "the paper in object is no 'ID paper' at all"

Then, "ID" probably should not be mentioned at all.

Why not ? All immunology is based on this external agency: we are accustomed to define the immune system as an entity who sees, recognizes, discerns and discriminates, takes the decision and then goes to the attack against the enemies. Again, it is clearly absurd. This is an ID-like representation of the peptide-peptide recognition between an antibody protein and an antigen protein. I do believe that we have to face with such kind of representations in order to demolish them logically (i.e. by the experimental data). However, this is just my personal view. Darja

By Darja Kanduc (not verified) on 17 Feb 2010 #permalink

Could you explain this curious congruence?

Kanduc 2010

…the CaM sequence is characterised by both specificity and complexity (what information theorists call 'specified complexity'); in other words, it has 'information content'.

Meyer 2000

Systems that are characterized by both specificity and complexity (what information theorists call "specified complexity") have "information content".

Dear Darja,

Thank you for coming here and addressing our concerns.

I did read your paper, including those sentences you quote here, but they seemed to me to be in direct conflict with other statements in your paper, so forgive me if I seem confused - I am.

I worry that even if you did not intend for your paper to be "an ID paper", the language you used will be interpreted by others (as it has been here) to support the ID position. If it isn't too late to make final edits to the paper, then please consider doing so. You might start by removing 'specified complexity'; as you can see from this thread, that is a phrase loaded with more meaning than you perhaps intended.

And while I'm on the subject, that sentence in your paper is obviously copied and only very sightly modified from Stephen Meyer (2000), a prominent proponent of ID. If you did not intend to promote ID, why did you copy almost verbatim a piece of well-known ID propaganda?

Kind regards,

Karen

By Karen James (not verified) on 17 Feb 2010 #permalink

I was aiming at explaining that specificity and complexity that have been put together in the expression "specified complexity" in order to justify an external intelligence, actually are properties belonging to the protein sequence itself (i.e without any need of something external). Darja

By Darja Kanduc (not verified) on 17 Feb 2010 #permalink

I have deleted the incriminated phrase from the text i.e. the quote "(what the, information theorists call "specified complexity")". However, I repeat that I deliberately put the phrase in the text since I was pursueing the antithesis between my position and the fideistic position contained in the phrase. Anyhow, I realize that it had to take more room in the talk, it had to be debated in more details. Perhaps in such a synthetic way was misleading. Darja

By Darja Kanduc (not verified) on 17 Feb 2010 #permalink

I was aiming at explaining that specificity and complexity that have been put together in the expression "specified complexity" in order to justify an external intelligence, actually are properties belonging to the protein sequence itself (i.e without any need of something external). Darja

Then you should probably just have written that. What you have done is crystal clear. What hypothesis your analysis intends to test is not so clear.

By Antiochus Epiphanes (not verified) on 17 Feb 2010 #permalink

http://www.biomedexperts.com/Profile.bme/667645/Darja_Kanduc

DK@46

Let me clarify, please, that the paper in object is no "ID paper" at all. On the contrary. Please, read under the introduction.

I read in the abstract:

Current theories have proven inadequate to explain the origins of biological information such as that found in nucleotide and amino acid sequences; an 'intelligent design' is now a popular way to explain the information produced in biological systems.

It's rather hard to misinterpret the abstract: Current theories inadequate; ID is popular explanation.

If it weren't an ID paper, I'd expect the abstract to say "...ID is now a popular (but false) way to explain ..."

DK:

By writing "specified complexity" in the not yet edited proofs, it was just intended to mean that specificity and complexity are properties intrinsic to the amino acid sequences (and of no derivance from an external agency, of course).

Sure this mis-interpretation of the paper comes from my having been too synthetic. Please, waste a bit of your time in going through the paper: my aims will be of evidence.

This is much better. Thanks.

Darja,

I agree with you that unnecessarily anthropomorphic terminology should be avoided, and I often correct people who (innocently) use the word 'design' when they talk about evolved structural or functional adaptations.

I'm very glad to hear both that you have deleted the phrase 'specified complexity'and that you are still able to revise the paper.

If I'm interpreting your comments here correctly, you are saying that you were trying to argue against the concept of 'specified complexity' by showing that what at first may appear to be intelligently designed is actually just a natural and expected property of protein sequences. Unfortunately, I don't think your paper communicates this clearly, and an innocent reader might conclude (as I did) that you are supporting, not 'demolishing', anthropomorphic terminology and intelligent causes.

To fix this, I suppose one thing you could do would be to cite Meyer and then explain a lot more clearly how your data refute both intelligently designed 'specified complexity' and anthropomorphic conceptions of immunity.

By Karen James (not verified) on 17 Feb 2010 #permalink

"Then you should probably just have written that. What you have done is crystal clear. What hypothesis your analysis intends to test is not so clear."

This is a nice question, thanks. My problem is simple: I wish to understand what makes a peptide immunogenic. After MHC binding algorithms, mimotopes, phage display libraries, immunosuppression and adjuvants, I am getting a congruent picture only with the concept of rare peptide sequences that are always found at the core of the immune recognition. First, in my own experiments, and then, almost constantly, in the scientific literature on epitope mapping. The present paper is kind of an extension of this concept: important sites are always dislocated at level of rare motifs. I believe this is an important concept in biology and immunology: to my knowledge, never said or demonstrated by anybody. And simple and intuitive as it has to be. Darja

By Darja Kanduc (not verified) on 17 Feb 2010 #permalink

Darja Kanduc:

All immunology is based on this external agency: we are accustomed to define the immune system as an entity who sees, recognizes, discerns and discriminates, takes the decision and then goes to the attack against the enemies. Again, it is clearly absurd. This is an ID-like representation of the peptide-peptide recognition between an antibody protein and an antigen protein.

I don't think any serious immunologist or biologist in modern times has thought that there is conscious intention behind immune reactions. To set out to debunk such an idea is to set out to destroy a straw-man argument. I'm afraid it will only confuse things further if it's written in the style of your paper.

Now that you are here, would you care to address commenter Blake Stacey's concerns (see above) about pentapeptides you claim are unique to the calmodulin sequence, but don't actually seem to be?

By https://me.yah… (not verified) on 17 Feb 2010 #permalink

Using function to define "information" is the only sensible way to do it with respect to evolution. Of course, "function" is a woolly concept and needs to be nailed down a bit too. As a very rough first-order approximation, assuming sequences annotated as a binding side or whatnot seems ok-ish.

One has to choose some way of choosing subunits, and pentapeptides, while a bit arbitrary, is not obviously bad. Though they should have done the same analysis on a few different lengths to see if the correlations still hold.

The authors have obviously not read the relevant literature on the topic of the information content of molecular sequences. Hell, Tom Schneider's excellent website on the topic has been around since 1995! We actually have well defined terms and methods for this sort of analysis.

Finally (at least from me), the authors make the all to common mistake of trying to use information theory on a single sequence.
(say it along with me)
INFORMATION IS ONLY DEFINED FOR A POPULATION OF SEQUENCES.
If they really want to look at how information is concentrated uncommon protein sequences (actually an interesting reasonable hypothesis), they should have looked at which sequences are conserved. This is also a better way of getting at information, since a better first-order approximation for "functional" sequences (in an evolutionary sense) is conserved sequences.

Anyways, I would have bounced this paper with a major revision needed. The authors could have actually done something interesting without too much more effort.

By travcollier (not verified) on 17 Feb 2010 #permalink

#60:

To set out to debunk such an idea is to set out to destroy a straw-man argument.

Exactly! Like I said, it's a straw man cupcake with false dichotomy frosting.

By Karen James (not verified) on 17 Feb 2010 #permalink

"Now that you are here, would you care to address commenter Blake Stacey's concerns (see above) about pentapeptides you claim are unique to the calmodulin sequence, but don't actually seem to be?"

There are 36,000 proteins cataloged in the human database: they include fragments, isoforms, cDNA,
obsolete entries, hypothetical, etc.

As all of us know, the databases change day-by day. So the data you read have been changed. Just in the example of the VPMLK pentapeptide, today we find a total of 11 entries. But 6 belong to cDNA, FLJ93928, highly similar to human mRNA, and 1 is the antigen to which the VPMLK pentapeptide belongs. So the similarity is 4 matches (no more 0).

But we must pay attention to the fact that all of the pentapeptide entries increase parallelely. For example, the calmodulin QLTEE pentapeptide that had 19 matches at the time of the analysis, now it presents more than 50 entries (actually 120 without the clean-up).

So the the ratio between high/low is always respected. The concept of rarity is respected. I hope I've been clear. Darja

By Darja Kanduc (not verified) on 17 Feb 2010 #permalink

Darja Kanduc @ 50 "Why not ? All immunology is based on this external agency: we are accustomed to define the immune system as an entity who sees, recognizes, discerns and discriminates, takes the decision and then goes to the attack against the enemies. Again, it is clearly absurd. This is an ID-like representation of the peptide-peptide recognition between an antibody protein and an antigen protein. I do believe that we have to face with such kind of representations in order to demolish them logically (i.e. by the experimental data). However, this is just my personal view. Darja"

Because because you muddy your science with your "personal" view. And your audience (the vast majority of it) doesn't need ID to be "demolished" logically because it's not an issue for your audience.

The "external agency" you talk about is an analogy in Immunology. If people employ that analogy, they aren't necessarily invoking god. The phrase "intellegent design" is invoking god (and really doesn't have a place in your paper, especially, if you want your science to be taken seriously.

=======

Darja Kanduc @ 46 "Sure this mis-interpretation of the paper comes from my having been too synthetic."

Yes, that is the problem. You should fix it!

=======

By the way, "specified complexity" is poor English (it doesn't mean anything).

"I don't think any serious immunologist or biologist in modern times has thought that there is conscious intention behind immune reactions"

I do not believe that the issue consists in being serious or not serious. Please, allow me to observe that there is all the immunology texts use the terminology to attack, to see, to protect, etc. I know it's something we had in heritage from Metchnikoff, from an era of plagues, etc. etc., nonetheless, the idea the textxbooks are giving is just the contrary of a scientific talk.

By Darja Kanduc (not verified) on 17 Feb 2010 #permalink

"Finally (at least from me), the authors make the all to common mistake of trying to use information theory on a single sequence.
(say it along with me)
INFORMATION IS ONLY DEFINED FOR A POPULATION OF SEQUENCES.
If they really want to look at how information is concentrated uncommon protein sequences (actually an interesting reasonable hypothesis), they should have looked at which sequences are conserved. This is also a better way of getting at information, since a better first-order approximation for "functional" sequences (in an evolutionary sense) is conserved sequences."

I take the liberty of observing that if you had gone through the references, you'd have found the following citation:
[26] Kanduc D, Capone GM. The similarity profile of the human proteome as a fractal dimension. Biol Forum 2008;101:142-145
In other words, what has been presented for calmodulin in the paper in object, has already been thoroughly carried out for the entire human proteome. Darja

By Darja Kanduc (not verified) on 17 Feb 2010 #permalink

"The "external agency" you talk about is an analogy in Immunology. If people employ that analogy, they aren't necessarily invoking god. The phrase "intellegent design" is invoking god (and really doesn't have a place in your paper, especially, if you want your science to be taken seriously."

I'm getting lost. Do you mean that immunology can use the concept of external agency at condition that this external agency is not called god ?

By Darja Kanduc (not verified) on 17 Feb 2010 #permalink

"... the information produced in biological systems."

I bet the twat can't even explain what he means by "information". Bah, my Biokabbalah trumps his Bionumerology any day. I wish people would stop using nonsense phrases like "digital code in DNA"; that only encourages bullshit artists like Kanduc.

"However, Kanduc does propose something that actually is interesting: that the rare pentapeptide sequences in specific genes also correlate with regions that have important functional roles."

I'll bet that rare pentapeptide sequences in specific genes also don't correlate with regions that have important functional roles. Anyone have access to the sequence database and want to do a proper mathematical analysis?

By MadScientist (not verified) on 17 Feb 2010 #permalink

Darja Kanduc @ 65 "Please, allow me to observe that there is all the immunology texts use the terminology to attack, to see, to protect, etc. I know it's something we had in heritage from Metchnikoff, from an era of plagues, etc. etc., nonetheless, the idea the textbooks are giving is just the contrary of a scientific talk"

You seem to be implying that this "customary" language is related to an "intellegent design" hypothesis but that doesn't appear to be the case.

While one could criticize the "customary" language, it seems like this paper is an odd place to do it.

"I'm getting lost. Do you mean that immunology can use the concept of external agency at condition that this external agency is not called god?"

Yes (basically). When people use that language in legitimate scientific textbooks and papers, they are not invoking god. The historical origin of that language might have "god" related but no Immunologists who use that language now is invoking god. It's a cultural residue.

""I'm getting lost. Do you mean that immunology can use the concept of external agency at condition that this external agency is not called god?"

Another way of looking at this "external agency" is as a stand-in for an unknown process. One could argue that the particular language is less than optimal but it's very far-fetched in believing that such language means the people using are saying "god did it"!

The fact that people (here) were surprised (and confused!) by the presence of the phrase "intellegent design" in a "pure" science paper is telling. It's really you who are confusing them!

It appears you are trying to do two things in the one paper:

1) Showing the results of some sort of datamining.
2) Arguing against "customary" language.

Doing these two things at the same time is confusing!

It also appears your hypothesis of why people use the "customary" language is because they intend to "invoke god". I believe this hypothesis is incorrect. I believe such language is used out of habit and that people using it are not "invoking god".

While one could criticize the "customary" language, it seems like this paper is an odd place to do it.

Exactly. I could imagine a letter to the editor about the semantics of the words used to describe the immune system in an attempt to usher in a more accurate usage the words. I don't have a particular problem with the current texts, though. And if one should hope to change the general usage of the words, the letter would need to appear in Nature or something, not Peptides.

Mixing this in with a bunch of pentapeptides and statements about the proteome seems to be a recipe for total confusion.

Also, it looks to me like there is a confusion among the comments about ID, God, and what Kanduc seems to see as a problem with the immunology texts. It's not that God is meddling with immunity, but that the immune system is seen as a self-aware entity. There is only a parallel to a conscious, self-aware designer, no claim that it is the designer himself squashing bugs that attack our bodies.

By https://me.yah… (not verified) on 17 Feb 2010 #permalink

It's rather hard to misinterpret the abstract: Current theories inadequate; ID is popular explanation.

If it weren't an ID paper, I'd expect the abstract to say "...ID is now a popular (but false) way to explain ..."

English is not my mother tongue. But, as far as I know, popular is not an appreciative adjective. As a second note, I'd observe that popular theories grow and diffuse, when coherent and experimentally validated thoughts and theories are lacking.

By Darja Kanduc (not verified) on 17 Feb 2010 #permalink

I finally got a chance to read the manuscript, and it is certainly not at all about intelligent design, or does it seem to provide support for any of that nonsense. The choice to use those words together in the abstract, represents a convergence with the wartier, nastier, pseudoscientific movement. Further, it is embedded in a sentence that means something that is not supported by the MS, and could not have been intended:

Current theories have proven inadequate to explain the originsof biological information such as that found in nucleotide and amino acid sequences; an‘intelligent design’ is now a popular way to explain the information produced in biological systems.”

What this would seem to mean is that current hypotheses explaining the relationship between nucleotide and amino acid sequence are inadequate; however, it is popular to say that an external intelligence (not necessarily the Abrahamic deity*) did it.

Rather, my interpretation of the remaining paper is that aa pentamers of calmodulin specifying important functional domains are less likely to be found in other proteins of the proteome than pentamer motis with little functional significance. The trajectory that Darja (if I may call you that) sees from this is that highly functional domains are not likely to overlap in sequence because they are somehow less random than aa pentamers with less functional constraint. Pardon, if I have misrepresented your argument, but that is my first take.

If that isn’t entirely a ridiculous interpretation, it would seem that evolutionary theory might support it. Protein domains that are necessary for very basic cellular processes are more conserved than those which have lesser roles, or may be merely structural. This is the result of stabilizing selection. When a protein motif has attained some fitness peak, any variation in the aa sequence will likely lead to a reduction in function. In other words, when it functions efficiently, it is unlikely that any change would result in an improvement in function…the more important the function, the more likely that any change will be removed from the population through selection. This justification isn’t explicit in the MS, which makes me wonder if I have missed something. Of course the other rationale that springs to mind is that these functional domains are reactive in their environment. The presence of a highly reactive pentamer in a non-catalytic domain might be disadvantageous, especially if the pentamer is depleting cellular resources (sucking up phosphate groups) without doing something useful.

There are some elements of this paper that strike me as odd, but I would like to think about them before I starfart all over myself.

Darja—It would be wonderful if you could alter the abstract. You do not mean “Intelligent Design” in the way that most English speakers would understand. That movement will champion this paper. That can’t be good for you.

*But c’mon. We all know it was the Abrahamic deity**.
**And by the Abrahamic deity, I mean Jesus.

By Antiochus Epiphanes (not verified) on 17 Feb 2010 #permalink

Darja Kanduc @ 74 "It's rather hard to misinterpret the abstract: Current theories inadequate; ID is popular explanation."

It's a "popular explanation" for non-scientists.

You need to provide data to support the idea that it's popular among scientists! (The "customary" language doesn't do that!

Darja Kanduc @ 74 "English is not my mother tongue. But, as far as I know, popular is not an appreciative adjective. As a second note, I'd observe that popular theories grow and diffuse, when coherent and experimentally validated thoughts and theories are lacking."

The problem with "popular" isn't that it is "appreciative". The problem is that it isn't "popular" among scientists in general or immunologists in particular!!

Very, very few readers of "Peptides" and your article would put any credence in the "intellegent design" hypothesis. You are, in essence, making a "straw man" argument!

You do not mean “Intelligent Design” in the way that most English speakers would understand. That movement will champion this paper. That can’t be good for you.

Exactly. Substitute 'Intelligent Design' with 'Creationism' in the original sentence and read again. That is what the ID camp will see.

Also, I'm not sure Kanduc quite grasps the connotations of 'popular'. It's not a pejorative, and in a context like this, it cannot really refer to the population at large (which is unlikely to be concerned with the complexity of biological sequences).

By https://me.yah… (not verified) on 17 Feb 2010 #permalink

Actually, most of what I had to say, PZ included in his post--but more eloquently than my ramble. I had a look at the abstract that Blake Stacey posted on another thread, got the MS and just dove in (home with a sick kid, and she is taking a nap!)

I don't really understand if "specified complexity" has any meaning apart from what Dembski attributes to it. Is this a term that he coined or coopted?

By Antiochus Epiphanes (not verified) on 17 Feb 2010 #permalink

(big link) @ 73 "Also, it looks to me like there is a confusion among the comments about ID, God, and what Kanduc seems to see as a problem with the immunology texts. It's not that God is meddling with immunity, but that the immune system is seen as a self-aware entity. There is only a parallel to a conscious, self-aware designer, no claim that it is the designer himself squashing bugs that attack our bodies."

Even so, I condend that nearly-no scientist believes in this "self-aware entity" hypothesis!

You do not mean “Intelligent Design” in the way that most English speakers would understand.

Dear Antochio, I'm going to think that you're definitely right...Honestly, I didn't expect such a pandemonium. Trying to follow your suggestion for the abstract. Darja

By Darja Kanduc (not verified) on 17 Feb 2010 #permalink

Dear All,

As somebody observed, I am in Italy, and now in Italy it is night-time. So, I wish to say hello to you all and thank for the intense and stimulating debate. I was struck by the appassionate participation, hope I didn't offend anybody, and remain grateful for what I learnt. My best. Darja

By Darja Kanduc (not verified) on 17 Feb 2010 #permalink

Darja--I sympathize. The amount of effort required to perform and experiment and write it up (especially in a second language) is significant. It must have ben puzzling for you to see the blogosphere achirp over a few phrases.

Good Luck.

AE

By Antiochus Epiphanes (not verified) on 17 Feb 2010 #permalink

Wait, what? It's a review?

You sound surprised PZ. Why Is that? Did you seriously expect a "Material & Methods" or a "Results" section in a paper that claims to support ID?

Yubal@85 "You sound surprised PZ. Why Is that? Did you seriously expect a "Material & Methods" or a "Results" section in a paper that claims to support ID?"

A good example of why people should read all the comments before commenting!!

davep,

I was referring to creation science/ID in general, and of course the controversy around that issue.

I do not care what scientists believe/think as long as their research comes to conclusive, reproducible results, ID or not ID, who cares, the last word is spoken by the facts, you should know that. If you read a little in the creation science journal you might get the point I was trying to make. Their articles are all reviews.

concerning the paper in dicussion:

Mining the human proteome for pentapeptid sequences is not exactly what I would do with my time, just based on the fact that the structure of proteins is in general more important for the function than the overall sequence. So, if you are already at it, you might want to restrict yourself to disordered and/or surface exposed peptide sequences.

The freedom of sequence is given only when the is no functional impact on the active site and the required global structure of a protein, assuming the protein function is to be preserved. Again, the function is conserved via the structure, the few important residues are conserved on their site and decorate

So what kind of insight does an sequence analysis of the human proteome actually provide? I would say very little without accompanying structural, biochemical and biophysical data or even "computational data".

Proteins are not language, not a series of words represented by some conserved sequence. Is a sequence HKLYL in fact conclusive without the context of the other amino acids up and down stream of that site? Or is HKL part of motive I and YL part of motive 2 and they just happen to occur often because they pack easily in a tight turn? You can find patterns wherever you look, especially when you restrict your alphabet. Just try to apply the "bible-code" to a text in mandarin and you will see what I mean.

What happens if you look for sequence trimers or dodecamers?

Does anyone account for the DNA/RNA stage of these especially rare or especially abundant pentapeptides? Are there obvious patterns (stable tertiary nucleotide structures, reserved regulatory sequences, GC-rich-elements etc.) that would explain the aberrant occurrence of said peptides?

To emphasis my point -this time- a little riddle for you guys. Why is there no octapeptide with this sequence

PPCCCCPP

in the database?

(non-redundant protein sequence @ BLAST)

This thread has been a fascinating read. Even though we have what appears to be the paper's author commenting to deny any support for ID, I find myself remaining rather skeptical about the assertion that all this is just a mistake.

I'm certainly willing to have my mind changed, but at the moment it beggars belief that simple language errors caused an almost word-for-word quote from Meyer on "specified complexity" to appear in a paper that also has the words "intelligent design" in the abstract. Especially when Kanduc asserts that he/she is unfamiliar with the creationist usage of the either phrase in English; if Kanduc didn't read Meyer, how did the quote end up in the paper? If Kanduc has read Meyer, how can he/she be unfamiliar with the phrase "intelligent design" or "specified complexity" as used by Meyer?

By Form&Function (not verified) on 17 Feb 2010 #permalink

Hmm. Ok, now that I know this isn't what it appeared to be initially. It makes sense that a) specific patterns match in immunology, and that those patterns **need to be** in functional sections. Why? Because immune systems have to be able to identify pathogens, and identifications of DNA that isn't in the "functional" parts of such pathogens would lead to misidentification of sequences that landed in a genome, but which do not change its overall function. This happens all the time, and is the reason for lateral gene transfers. If such transfers caused "hits" it could potentially kill the very organism trying to defend itself, due to large numbers of left over remnants getting into cells, but not actually "infecting" them. Its also possible it would cause other issues, including failure to correctly identify differentiated cells as "self", and other such problems *including* attacks on cells that simply suffer minor errors, during replications, which produce the numerous non-fatal mutations that happen all the time.

Basically, anything so precise that it attacks "any" changes, not just *critical* ones, would likely derail everything from self repair to breeding. So, any identifiers need to be very specific, unique, and unlikely to be confused with far more common patterns, including the absolutely necessary ones, for the species using them.

I have encountered the phrase "specified complexity" outside of ID, but since Dembski adopted it, I've never seen it in any context but ID.

In the context where I saw it, it referred to computational complexity. The specificity was wrt the level of complexity, not the precise piece of information.

I read a computer science paper back in the early 90s that used a very similar method for characterizing text. They broke the text file into short, possibly five letter, chunks serially, then counted them. They could characterize a document using top twenty or so most common sequences. The authors were able to recognize the language of the document and, given appropriate samples, the author of the text. The authors of the paper were actually surprised at how well it worked, though you'll noticed that folks who use this kind of analysis, e.g. the team at Amazon, use a more linguistically targeted statistical approach.

Still, I wouldn't be too surprised if you could use such a simple, mechanical probing scheme to characterize genetic sequences. After all, genes evolve through modification and duplication which leave statistical footprints. I remember at least one group trying a partial matching approach for predicting protein structure.

Of course, it's a good idea to be careful using terms hat assign agency, especially in a touchy area like evolutionary biology. (You'll notice that no one gets into a swivet when someone says that computers "add" or "search", but we don't have a controversial robotic rights movement yet.)

By Kaleberg5 (not verified) on 17 Feb 2010 #permalink

I always laugh when someone drags out "information theory" to try and prop up some foolish idea or other.

Let me get this out there once and for all.

Given:
-an alphabet A
-string S of length N encoded in alphabet A
Information theory shows that:
Random strings of length N will tend to have the maximum Shannon entropy (ie. "Information Content") of all the strings of that length in that alphabet.

Let me let that sink in for a moment.

Did you get it?

That's right! This means that if you randomly change any string in any alphabet that was not completely random, you may have increased its information. Not only that, but the more highly ordered the string, the more likely it is that a random change will result in an increase in information.

In information theory, the colloquial notion of "order" is inversely proportional to information.

So, mutations adding information? Yeah, you can prove that. With math. Which means Behe and anyone trying to prop up or emulate his foolishness can suck on lim(x->∞), where x is my... well, I'll leave that as an exercise for the reader.

(Sorry for all the emphasis formatting, we all have things that get under our skin, this one gets under mine)

By dmorrison (not verified) on 17 Feb 2010 #permalink

dmorrison (#92):

If one is given an alphabet and a string but no notion of probability whatsoever, aren't we talking about Kolmogorov-Chaitin, not Shannon?

By Blake Stacey (not verified) on 17 Feb 2010 #permalink

Put another way, the concept of "Shannon entropy of a string" doesn't quite make sense; or, at least, the idea needs a little extra precision. We can speak of the Shannon entropy of a probability distribution over possible strings, or, if we have only a single message to work with, we can look at it character-by-character, and compute the Shannon entropy of the character frequencies. This latter quantity is maximized when all characters are equally likely.

If we built an N-peptide chain by rolling a d20 N times, then for large enough N, the frequency of each amino acid would be roughly 1/20 (law of large numbers and all that), and so the Shannon entropy of the empirical character frequencies would be just about maximal.

So, yes, random strings maximize "information content" — it just pays to define terms precisely.

By Blake Stacey (not verified) on 17 Feb 2010 #permalink

A few people have complained about the use of ID phraseology in the paper, but there are a number of other concerns about the paper. I must say that the paper is very clear that,

As it is clearly absurd to ascribe anthropomorphic behaviors and intelligent qualities to molecules and cells, we must search for the logic of molecular events in the informational features of the molecules themselves.

Other concerns remain. The idea of pentapeptides as some unit of protein language certainly has some novelty, but I don't see that this idea is being thoroughly tested here. Moreover, the data presented seems simply to restate that short sequences with particular functions are co-opted via selection to those functions and those functions only. This is a very trivial observation, and exactly what one would expect - if functional sequences were widely distributed throuoghout the proteome, then one would expect that to reflect the ubiquity of that function.

The only novelty here is the focus on pentapeptides, which seems to be something of a pet theory of Kanduc's (fine, if it is supported by some evidence).

I wish to understand what makes a peptide immunogenic. After MHC binding algorithms, mimotopes, phage display libraries, immunosuppression and adjuvants, I am getting a congruent picture only with the concept of rare peptide sequences that are always found at the core of the immune recognition.

This appears trivial, although I would admit that I have no expertise in antigenicity, and therefore cannot say whether this intuitive claim has previously been supported by literature mining sequence databases.

The present paper is kind of an extension of this concept: important sites are always dislocated at level of rare motifs. I believe this is an important concept in biology and immunology: to my knowledge, never said or demonstrated by anybody.

This is the part I don't understand. There is a whole body of evidence (literature and databases - look at PROSITE) demonstrating the functionality of short motifs. Think about protease recognition sites, phosphorylation motifs, secretion signals, etc, etc.

The definition of rare motifs, as stated in this paper, seems to be circular, and therefore useless.

I repeat, as far as I can tell, the only novel claim is that pentapeptides form the basis of some kind of protein language. I absolutely cannot see any basis for this claim, since there are good examples of shorter and longer functional motifs and peptide chains. I can see no reason to suppose that the functionality of a phosphorylation site should be considered to be different from a tetrapeptide sumoylation site or an octapeptide platelet receptor. I'm also concerned that a search for rare sequences of any specified length would necessarily tend to identify rare sequences of that specified length within functional regions, given that this is the basis of motif identification.

I am unable to accept the factual basis for the claims made in this paper. I am also concerned that this article appears to have been submitted as a review article, when it is clearly nothing of the sort. Finally, there seems to me to be something deeply troubling about publishing a manuscript which even the author acknowledges requires significant revision to avoid misunderstanding (even with the caveat provided by the publisher).

I take the liberty of observing that if you had gone through the references, you'd have found the following citation:
[26] Kanduc D, Capone GM. The similarity profile of the human proteome as a fractal dimension. Biol Forum 2008;101:142-145

A paper published in Italian, with no abstract available on PubMed, and cited precisely twice by papers with D. Kanduc as corresponding author. I think that greater efforts need to be made to ensure that data is available to rest of th world's scientific community, if you really want to suppport that claim.

As it is, I have no way of verifying that the datamining techniques isn't simply to look for rare, functional, pentapeptides. In which case, my concerns haven't been addressed.

English is not my mother tongue.

With respect, and this is advice to any non-native speaker of a language, you have a duty to ensure that your manuscript is proof read by somebody who is completely conversant in the language you're publishing in. That way, misinterpretation of clumsy or incorrect phrasing can be avoided.

By Bernard Bumner (not verified) on 17 Feb 2010 #permalink

It would also make sense to do a statistical comparison of the goodness of fit against some other models (length of chains, for instance), as others have mentioned. If the author is not qualified to do the analysis, it is a simple matter to either find a co-author to consult or a statistician for hire. I (and many others here) could recommend some competent colleagues and there are almost certainly some statisticians here as well.

MadScientist, Your comment (#68) was totally out of line. Darja Kanduc has taken great pains to come to this thread, have a civil and reasonable discussion, admit her mistakes (which seem to amount to language/cultural barriers, though I'm still concerned about the near-verbatim quote of Meyer 2000) and make a real effort to correct them. She is, as far as we can tell, genuinely on our side. She thought she was 'demolishing' ID not supporting it. Giving you the benefit of the doubt, maybe you didn't read all 67 comments before yours; maybe you assumed that Kanduc was in fact an ID shill; maybe you were just commenting in the usual 'tone' in PZ's comment threads without realizing there was a genuine, constructive discussion going on here. Nevertheless, an apology wouldn't be out of order.

By Karen James (not verified) on 18 Feb 2010 #permalink

Darja, I agree with Antiochus Epiphanes (#75):

Darja—It would be wonderful if you could alter the abstract. You do not mean “Intelligent Design” in the way that most English speakers would understand. That movement will champion this paper. That can’t be good for you.

...and with Bernard Bumner (#95):

With respect, and this is advice to any non-native speaker of a language, you have a duty to ensure that your manuscript is proof read by somebody who is completely conversant in the language you're publishing in.

In this case, though, you should ask someone who is not only conversant in English speaker but also familiar with the rhetoric of creationists to proofread your paper. Moreover, if, in your revised paper, you are going to continue to assert that anthropomorphic and/or ID language is 'popular' or 'customary' you need to provide citations that support that assertion before you move on to 'demolish' that view, and you need to be much more clear (not just in one sentence but throughout your paper) that 'demolishing it' is, in fact, what you are doing.

I really want to believe your intentions were to show how 'absurd' it is to ascribe intelligent agency to immunity, and your statements here that you are not at all an ID supporter, but to put to rest any lingering doubts, I think you need to address this issue of how the Meyer 2000 quote ended up in your paper. As Form&Function (#88) said:

...it beggars belief that simple language errors caused an almost word-for-word quote from Meyer on "specified complexity" to appear in a paper that also has the words "intelligent design" in the abstract. Especially when Kanduc asserts that he/she is unfamiliar with the creationist usage of the either phrase in English; if Kanduc didn't read Meyer, how did the quote end up in the paper? If Kanduc has read Meyer, how can he/she be unfamiliar with the phrase "intelligent design" or "specified complexity" as used by Meyer?

Thank you for your willingness to participate in this discussion with us. I hope it results in an improvement of your paper and prevents the inevitable damage to your professional reputation that would result if your paper were published as is.

By Karen James (not verified) on 18 Feb 2010 #permalink

Can anyone explain why a "phrase" that is used less often would contain more information? If you linguistically analyze codes, you're looking for the stuff that gets repeated a lot. Honestly, when you have a five-letter word like "GREAT" that is used a lot, vs. a five letter word like "GMHDR" that is hardly ever used for anything, which has more information?

Can anyone explain why a "phrase" that is used less often would contain more information?

Information appears to to be used synonymously with function. In which case, it is trivially true that certain types of function can be attributed to short peptide sequences, and that many of those sequences will be specific to particular proteins (because they are selected and thereby confer specific, unique functions).

This (rather too broad and not very meaningful) definition seems to be stated in the introductory paragraph,

The amino acid sequence of a protein molecule can be thought of as a ‘phrase’ that carries the necessary information for the protein's activity (structural, enzymatic, or immune), as well as information about the protein's modification (such as methylation, citrullination, phosphorylation, or sumoylation), destination (for example, signal peptides), interactions, complex structure (such as leucine zippers, zinc fingers, disulfide bridges, or binding domains), and half-life (i.e., proteolytic and ubiquitinylation sites). Numerous different information sets located along protein sequences are stored in structural motifs such as folds and helices and in short linear amino acid sequences, that represent the ‘words’ of the protein phrase.

Equally, in the discussion it is stated that:

Generally, a primary protein sequence can be represented by a repeating transition from highly similar areas to fragments with low or zero similarity. The wave-like similarity profile relates to different informational messages, with functionally significant areas coinciding with the hollows of the similarity profile. Zipf's law states that rare words have greater information content [14]; analogously, rare pentapeptide fragments appear to carry critical functional messages in proteins.

So, once again information simply means function. I have no idea whether the analogy using Zipf's law is apt, because I don't understand that particular bit of maths.

Anyway, it all seems very circular to me; information = function, functional sequences = information rich.

By Bernard Bumner (not verified) on 18 Feb 2010 #permalink

ubiquitylation, ubiquitinylation

Wonderful words.

I was afraid of that. Zipf's law is not that rare words are more functional, or information rich, or whatever. It's that the distribution of words by usage should look linear on a log plot. That's it. I don't even know how you'd get a sample size large enough to determine whether oligopeptides followed that rule or not - or why it would even matter considering that genetic codes are not a natural language.

Seriously, there are 3,200,000 possible "words" in the 5 peptide chain lexicon. Ideally, to follow Zipf's Law, you'd have the most common word twice as common as the next most common word, which was twice as common as the next most common word, and so on. Obviously, that would have a very long tail. In fact, to have a sample size that was big enough to on average contain one copy of the least frequent word (which I predict to be "M-M-M-M-M"), you'd need to have a sample size of 2^3,199,999 words. Or roughly a 1 followed by one million zeroes.

Or a googol googols of googols of oligopeptides. I am reasonably positive that such an analysis not only has yet to be done, but will not be done before the heat death of the universe.

While an author is ultimately responsible for everything that she publishes, it seems that the reviwers/handling editors have dropped the ball on this one. I don't know very much about information theory, and what "information" I am normally concerned with in a sequence of amino acids, or much more often DNA, is phylogenetic information...in which case the information is only applicable in comparisons of sequences and in respect to specific hypotheses of relationship...where was I going with this? Oh, yeah. When I submit a manuscript which is mostly theoretical, I have the realization that I have thought so hard on the problem, that my solution becomes crystalized in my mind...it is impossible for me to see its flaws without external help at that point. If my stupid idea gets published, the scientific community will be more than willing to help me to point out my errors. I want some intervention before that happens. trust that my reviewers really are trying carefully to see not only what is wrong with my idea, but what is wrong with how I have presented it. That's why they get acknowledged...the tougher their criticism, the better my paper will be.

FreeLunch#39

The question for those who follow corporate behavior is whether Reed Elsevier has decided that it can milk its huge stable of science journals by cutting back on the quality of review and fattening its bottom line for now.

I don't think this is the issue. Reviewers (AFIK) are never paid...for that matter handling editors and journal editors aren't either. The problem is that people reviewing manuscripts have a lot of crap to do, and the ball gets dropped from time to time, which may not be intentional or even recognized at the time. Example: I reviewed a paper last summer, and received the revisions for a second review a few months ago. I went back to my original review to see if my suggestions had been incorporated, and for the most part they hadn't. So I reread both the manuscript and my review, and realized to my shame that it was my review that was flawed. I remember working on that review, with no undue pressure, and in no special hurry. My brain simply failed. Every once in a while, two reviewers will have a simultaneous brain-fart.

By Antiochus Epiphanes (not verified) on 18 Feb 2010 #permalink

@Blake 93/94

Yes, I did oversimply things a somewhat zealously in my rush to get that out of my system. Thanks for filling in the details more rigorously.

By dmorrison (not verified) on 18 Feb 2010 #permalink

dmorrison (#104):

Glad to be of service.

FrankT (#102):

Zipf's law is not that rare words are more functional, or information rich, or whatever. It's that the distribution of words by usage should look linear on a log plot. That's it.

Zipf's law — power-law dependence of word frequency on rank — shows up when there's no function or meaning at all: gibberish displays the same trend.

By Blake Stacey (not verified) on 18 Feb 2010 #permalink

Dear All,

I just found a few minutes to join the dialogue ongoing in this blog. I do thank everybody for the attention given to the issue. I do not hide that I remain surprised, since I do not believe it’s worth of so much attention. Mutatis mutandis, in the paper in object there is exactly what was written in another paper of mine. I don’t believe it’s fair to quote myself, but I’m obliged to: pls, if you have time to waste (and I’d be glad if you had it), take a look at “Immunogenicity in Peptide Immunotherapy: From Self/Nonself to Similar/Dissimilar Sequences” in Multichain Immune Recognition Receptor Signaling: From Spatiotemporal Organization to Human Disease, edited by A. B. Sigalov. ©2007 Landes Bioscience. I might send the PDF in case. Just let me underline the conclusion of the paper: “Concluding Remarks. In 1859 Darwin demonstrated that complex, gradual adaptation processes arise over time without outside agency and, in so doing, he demolished teleology in science. Nonetheless, today we still have an immunology science dominated by the teleology of intentionality: explaining immune reactions in terms of self entities against nonself enemies and interpreting immune processes as meditated actions against enemies and protective conduct towards self entities.
In this context, the development of high throughput technologies and the nascent peptidomics research offer exciting new opportunities to comprehensively analyse peptides in the immune subsystem, that is to define the immuno peptidome. The time for a more precise answer to the logical question, “what are the molecular features that make a peptide immunogenic?” appears closer. The time for a geometrical definition of the limits and intersections among the three distinct domains of peptide antigenicity, immunogenicity and pathogenicity is getting closer as well. The challenges for these goals lie in archiving and functionally relating the vast majority of data derived from immunoassay experiments and bioinformatic predictions into a coherent informational mass relevant to physio and pathological processes. etc. etc.”

Said this, as far as I understood, another ‘hot’ issue would be the dual attribution of specificity and complexity to define quantitatively the information (see Karen’s question). May I offer to your attention the datum of fact that specificity and complexity characterize the information starting from the 50’ ? Starting with Miller’s experiments, and then Oparin studies and proceeding on with Ilya Prigogine and Manfred Eigen, specificity and complexity have characterized the informational content of macromolecules. There is an oceanic literature about this and I’m sure you know all this better than me. I used specificity and complexity to define the information of macromolecules more than 30 years ago in a book (see, pls. Cenni di Evoluzione Biochimica. It is in Italian so I’ll spare you from going through it: however, the first paragraph is a brief historical outline of the clerical dogmas that obstacled science. Is it possible that after more than 30 years we are still debating on this?). Intelligent design is currently used to discuss cancer therapy, evolution and robotics. What have we to do, to avoid this expression since in discussing information it can evoke the phantom of the non-science ? And if we wished to demonstrate that there is an intelligent design intrinsic to the molecules and residing in rare motifs (or in the electrical charge or in the hydrophobicity or something else), then what ? I do not understand why “specified complexity” must become verboten land just because it has been used in a sense contrary to that of Eigen and Shannon. Tomorrow there will somebody who’ll use the expression “complex specificity” to elaborate some other concepts. Then, what ? Are we going to eliminate these words from our vocabulary and these themes from our studies ? Everybody’s free to say, to write, to elucubrate whatsoever he/she wishes. Then the scientific community will judge and the civil society will decide.

I do not believe it is a problem related to being non-native-speaker-of-a-language. Of course, the paper was English-edited. Actually, I might have effectively a cultural barrier to understand the issue: honestly I do not have any knowledge of debates or antinomies or “fights” on creationist hypotheses here in Italy or, more in general, in Europe. Perhaps we already had our part of it in the past with Galilei and Spinoza. Honestly, I do not know. However, my personal view is that this debate took a disproportionate amount of time and attention. And after all, the world is large enough: everybody can believe whatsoever he/she wishes. Then, verifiable results will take over.

Also. I have to insist that the Zipf’s law states that the size of the largest occurrence of an event is inversely proportional to its rank.

Finally, to FranT whos says “Seriously, there are 3,200,000 possible "words" in the 5 peptide chain lexicon. Ideally, to follow Zipf's Law, you'd have the most common word twice as common as the next most common word, which was twice as common as the next most common word, and so on. Obviously, that would have a very long tail. In fact, to have a sample size that was big enough to on average contain one copy of the least frequent word (which I predict to be "M-M-M-M-M"), you'd need to have a sample size of 2^3,199,999 words. Or roughly a 1 followed by one million zeroes. Or a googol googols of googols of oligopeptides. I am reasonably positive that such an analysis not only has yet to be done, but will not be done before the heat death of the universe.”.
FranT: without waiting the death of the universe, with a great joint effort and the unpayable help of mathematicians and bioinformatics, currently we are analyzing the pentapeptide composition of the universal proteome (i.e. of all the proteomes available in the current databases). Preliminary data were already published and cited in the Peptides paper under discussion. Pls, check the references.

Best. Darja

By Darja Kanduc (not verified) on 18 Feb 2010 #permalink

I do not believe it is a problem related to being non-native-speaker-of-a-language.

Agreed. The problem has something to do with logical thinking.

By https://me.yah… (not verified) on 18 Feb 2010 #permalink

A paper examining tetra-, penta-, and hexa-peptide frequency in proteomic datasets. Notably, "We limited our further studies to only one length, pentapeptides, which proved to be a good compromise between informational resolution and run times for the computer calculations."

In this study, we have performed a large-scale investigation of all possible combinations of five amino acid residues, pentapetides, in order to characterize oligopeptide patterns that are over- or under-represented in general or with respect to a kingdom. We find not only sequence patterns of known and frequently-used features but also patterns due to compositional bias. In addition, we find novel patterns which might be part of features not revealed by current bioinformatic methods, forming structural building blocks or segments selectively filtered because of unfavorable properties or immune response-induced epitopes.

Whhat worries me is that this paper is referenced (as 1) in a Kanduc paper in the context that,

We explored the proteome set formed by combining currently known proteomes using the pentapeptide as a unit of length. Indeed, many studies suggest pentameric peptide modules as fundamental biological effectors in protein–protein interactions [[13], and pertinent refs. therein]. The reports are steadily increasing in frequency [1], [10] and [16] and support our use of the pentapeptide unit in the dissection of the relationships between protein structures and functions.

Clearly, the choice to analyse pentapeptides in the reference was based on methdological limitations, and it is specifically stated that there is overlap between the datasets of pentapeptides and hexapeptides, and that only 4-mers, 5-mers, and 6-mers were even considered.

It isn't that think there is no value in Kanduc's work, but simply that I think they may be pushing the hypothesis of pentapeptide = meaningfully distinct subunit too far. I wait to be proved wrong.

By Bernard Bumner (not verified) on 19 Feb 2010 #permalink

Darja, the fact remains that in the United States, and increasingly in Europe, creationists are aiming to obstruct (and in some cases succeeding in obstructing) science education. "Intelligent design" is one of their catch phrases.

Since you use that phrase in your paper, and since the writing is, frankly, confusing and difficult to understand, it will be unclear to most readers (as it was to many of us) whether your paper is supporting Intelligent Design or not. Creationists will take advantage of this confusion and claim that it supports ID and they will shout this to the world, and use it to further damage science education.

It appears that your own definition of "Intelligent Design" is not the same as theirs. You seem to use it to refer to the (inappropriate) anthropomorphic language that some scientists use when talk about immunity.

Why not then just simply:
1. remove the phrase "Intelligent Design" from your paper and substitute something else like "anthropomorphic language" or "the language of agency"?
2. Remove or change the sentence from your abstract ("Current theories have proven inadequate...").

If you don't creationists will use your paper to hurt science education.

By Karen James (not verified) on 19 Feb 2010 #permalink

Darja, to better explain what I mean (in case this helps), you wrote in your comment (#106):

Intelligent design is currently used to discuss cancer therapy, evolution and robotics.

If you understand that to many millions of people "intelligent design" means creationism, it sounds like you're saying creationism is a valid theory used in the serious sciences of cancer therapy, evolution and robotics! It would be much better if you wrote 'The language of agency is currently (and, as I argue here, inappropriately) used to discuss cancer therapy, evolution and robotics.'

By Karen James (not verified) on 19 Feb 2010 #permalink

Darja, Also, you still haven't answered the concern about the 'specified complexity' quote. It's not just the phrase specified complexity, it's the fact that the whole sentence was clearly copied and only slightly modified.

Your manuscript:

…the CaM sequence is characterised by both specificity and complexity (what information theorists call 'specified complexity'); in other words, it has 'information content'.

Meyer 2000

Systems that are characterized by both specificity and complexity (what information theorists call "specified complexity") have "information content".

These two sentences are too similar to be explained by chance. One must have been copied from the other, or both copied from the same original source. Now, it is possible that Meyer in fact copied you and not the other way around. Can you provide a quote from one of your papers prior to 2000 that shows this? Perhaps from 'Cenni di Evoluzione Biochimica' (which I cannot access as it is behind a subscription barrier and also I cannot read Italian).

By Karen James (not verified) on 19 Feb 2010 #permalink

Hey,

Read through all the comments, and I think the problem is that we have two meanings of ID here. For Kanduc 'Intelligent Design' means the scientific shorthand of anthropomorphising things, but to all the American commentators 'Intelligent Design' is a large anti-science movement that seeks to explain biological happenings by invoking God.

I'm not sure your getting across what a big thing this ID is. The fact that the words 'Intelligent Design' will, in the minds of pretty much most people in America and several in Europe, instantly make people think of God. If Kanduc doesn't mind taking advice from an undergrad I'd remove the words Intelligent Design from the abstract, because they are going to mean something very, very different than the innocent meaning you give them.

(If you aren't aware of the whole 'Intelligent Design' issue, give the word a Google. It has massive connotations, most of which are very anti-science).

By https://www.go… (not verified) on 19 Feb 2010 #permalink

Dear Karen, Dear Bernard, Dear All,

1st point: “A paper published in Italian, with no abstract available on PubMed, and cited precisely twice by papers with D. Kanduc as corresponding author. I think that greater efforts need to be made to ensure that data is available to rest of th world's scientific community, if you really want to suppport that claim”. You did not check correctly the issue: the citation reporting preliminary data on the entire human proteome is in English.

2nd point: you mention that the choice of 5-mer to validate the hypothesis of pentapeptide = meaningfully distinct subunit is a “forced” solution, since also, for example, 3-mers or 6-mers have been found to have specific role. Then you yourself answer the issue by citing "We limited our further studies to only one length, pentapeptides, which proved to be a good compromise between informational resolution and run times for the computer calculations." So it seems that, according to your citations, the pentapeptide might be a correct choice. Then, you observe that:
It isn't that think there is no value in Kanduc's work, but simply that I think they may be pushing the hypothesis of pentapeptide = meaningfully distinct subunit too far. I wait to be proved wrong.
My answer: once given that the pentapeptide can represent a good choice (as you yourself recognize), what better experiment than screening the entire literature, looking for the length of the modules crucially important in biological functions ? In this case there is no subjective bias, the data come from other labs and have been experimentally validated and controlled. In other words,
the hypothesis of pentapeptide = meaningfully distinct subunit derives from a corpus of data widely and differently tested. Of course, this does not mean that 3-mers or 6-mers are excluded from having important functions. This only means that we have to add a “mostly” (as I normally do), i.e.
mostly, pentapeptide = meaningfully distinct subunit

3rd point. You say: Since you use that phrase in your paper, and since the writing is, frankly, confusing and difficult to understand, it will be unclear to most readers (as it was to many of us) whether your paper is supporting Intelligent Design or not. Creationists will take advantage of this confusion and claim that it supports ID and they will shout this to the world, and use it to further damage science education.
Karen: if the Scientific Community depends on single phrases or words whithout the global picture, then unscientific movements have already won.

You ask: Darja, to better explain what I mean (in case this helps), you wrote in your comment (#106): Intelligent design is currently used to discuss cancer therapy, evolution and robotics.

My answer: Karen: go, pls, through PubMed, type intelligent design as keywords and you’ll see that it is no creationist tag. There is a lot of literature on the most disparate subjects using this phraseology.

You ask: Darja, Also, you still haven't answered the concern about the 'specified complexity' quote. It's not just the phrase specified complexity, it's the fact that the whole sentence was clearly copied and only slightly modified. …the CaM sequence is characterised by both specificity and complexity (what information theorists call 'specified complexity'); in other words, it has 'information content'.
And
These two sentences are too similar to be explained by chance. One must have been copied from the other, or both copied from the same original source.

Karen and Bernard: this is an expressive and logical issue. To #88: there is no mistake.

a) You say that I copied the Meyer’s phrase to support ID
b) I say that I used the Mayer phrase to support the conclusions derived from my analysis
c) My analysis, at the same time, demonstrates that external agents are unexistent.

Let me express in detail.
Meyer’s diction:
…Systems (that are unexplainable by physical laws) are characterised by both specificity and complexity (what information theorists call 'specified complexity')…
The phrase means: systems receive an a priori specification of their complexity by some external agency, so complexity and specifity imply external agents.

In the paper:
…the CaM protein (that performs a number of functions allocated and specified in rare motifs) is characterised by both specificity and complexity (what information theorists call 'specified complexity')…
The phrase means: the CaM protein presents complexity specified in rare motifs as a result of physico-(bio)chemical properties.

Demonstrating that the complexity of a protein sequence finds its specification in a metrics (the level of pentapeptide similarity) gives a mathematical dimension to the quantity and quality of specified (in the sequence) complexity (multifuncionality).

I used the exact Meyer phrase to indicate that what is supposed to be unexplainable without external agents, on the contrary can logically be explained with the scientific knowledge that is available. Demonstrating the coherence of the Meyer expression in an opposite context is a confutation per antithesis of any external agency. Using the same language of somebody else to express a different thought can’t be described, by definition, as plagiarism [literally: the unauthorized use or close imitation of the language and thoughts of another author and the representation of them as one's own original work].

Then you observe that I had to cite Meyer. But my aims did not consist in directly discussing (or taking over) the external agency from Meyer: the confutation per antithesis that I adopted nullifies the need for a referential discussion that would have derailed the paper from its target since the phrase was taken as a model that changes its implications following the protein analysis.

Finally: Why not then just simply:
1.Remove the phrase "Intelligent Design" from your paper and substitute something else like "anthropomorphic language" or "the language of agency"?
2.Remove or change the sentence from your abstract ("Current theories have proven inadequate..."). (Regarding this point see also n #112)
Already done (at least, I asked for their deletion), and only in consideration of the fact that I do not wish my research, theories and studies to be strumentalized.

Best Darja

By Darja Kanduc (not verified) on 19 Feb 2010 #permalink

Darja, thank you for continuing to address our questions. I would like to ask for clarification on the use of the Meyer quote. In your comment @ #113, you say:

I used the exact Meyer phrase to indicate that what is supposed to be unexplainable without external agents, on the contrary can logically be explained with the scientific knowledge that is available.

Does this mean that you had, in fact, read the Meyer 2000 article and copied his phrasing into your paper, and that you did so deliberately to refute Dembski's concept of "specified complexity", which he claims requires external agents?

By Form&Function (not verified) on 19 Feb 2010 #permalink

Darja Kanduc:

Then you observe that I had to cite Meyer. But my aims did not consist in directly discussing (or taking over) the external agency from Meyer: the confutation per antithesis that I adopted nullifies the need for a referential discussion that would have derailed the paper from its target since the phrase was taken as a model that changes its implications following the protein analysis.

I had a great laugh. Thank you. It is heartening to know that you were worried about derailing that paper.

PS. Not to keep harping on Italy, but I've had direct experience with several Italian exchange students who had problems with their citation technique. They kept taking sentences and paragraphs from other people's work without citation and failed to see the problem with it when it was pointed out to them. In one severe case this led to the expulsion of a Ph.D. candidate from the postgraduate program. Of course, my observations are entirely anecdotal. I work in the field of biosciences in an academic group in a North European country.

By https://me.yah… (not verified) on 19 Feb 2010 #permalink

...once given that the pentapeptide can represent a good choice (as you yourself recognize), what better experiment than screening the entire literature, looking for the length of the modules crucially important in biological functions ? In this case there is no subjective bias, the data come from other labs and have been experimentally validated and controlled. In other words,
the hypothesis of pentapeptide = meaningfully distinct subunit derives from a corpus of data widely and differently tested. Of course, this does not mean that 3-mers or 6-mers are excluded from having important functions. This only means that we have to add a “mostly” (as I normally do), i.e.
mostly, pentapeptide = meaningfully distinct subunit...

As a systematic analysis, I think that there is obvious value, not least of all as a potentially predictive dataset (if there really is a correlation between function and rarity).

I think that there is still a problem with some of the language in the paper, insofar as I think that some of the claims go too far. The description of pentapeptides as words in a language simply doesn't hold up in light of the fact that pentapeptides clearly represent a subset of other possible peptides lengths. Had you described them as a computationally useful data subset, I would have no objection - and actually I think it would go a long way address many of the concerns expressed above. I now understand why the analysis was conducted as it was, but I don't think that the nature of this selection is clear in the body of your manuscript, and this is a particular problem in light of the review status it has been given.

Anyway, I thank you for taking the time to respond here, and I hope that you will give serious consideration to some of the points raised.

By Bernard Bumner (not verified) on 20 Feb 2010 #permalink

Hullo,

I tried searching through pubmed with the phrase "intelligent design" (quotations included), and it's mostly stuff relating to creationism, especially for the biology-related fields.

:)

An unworthy undergrad

By Escherichia coli (not verified) on 20 Feb 2010 #permalink

To # 115:

Perhaps it may be useful for a better definition:

1) ….others’ work and words can only be used as support and are secondary to their own original thoughts....
2) The traditional distinction between original and plagiarized material maintains that not only is original work superior in terms of creative effort, but that it is not derivative.

Citations:
http://revolutionlullabye.wordpress.com/2009/05/28/johnson-eilola-and-s…

Johnson-Eilola, Johndan and Stuart A. Selber. “Plagiarism, Originality, Assemblage.” Computers and Composition 24 (2007): 375-403.

However, I'm going to ask the plagiarismadvice.org for further details and then i'll let you know.

To # 117:

Perhaps it may be useful the following list (not complete, there are many more). I run a restrictive PubMed search using “intelligent design” and giving “title” as a limit:

1: Mitta DA, Ellis NC. Development of intelligent design associates: a case study. Appl Ergon. 1993 Aug;24(4):235-43.

2: Cowell SM, Gu X, Vagner J, Hruby VJ. Intelligent design in combinatorial
chemistry: use of designed peptide libraries to explore secondary and tertiary
structures in peptides and proteins. Methods Enzymol. 2003;369:288-97.

3: Krieger K. Computer science. Life in silico: a different kind of intelligent
design. Science. 2006 Apr 14;312(5771):189-90. Erratum in: Science. 2006 May
5;312(5774):697.

4: Peters CA. The evolution of laparoscopy in pediatric urology--intelligent
design? J Urol. 2006 Jun;175(6):1993-4.

5: Couvreur P, Vauthier C. Nanotechnology: intelligent design to treat complex
disease. Pharm Res. 2006 Jul;23(7):1417-50.

6: Koder RL, Dutton PL. Intelligent design: the de novo engineering of proteins
with specified functions. Dalton Trans. 2006 Jul 7;(25):3045-51.

7: Srinivasan BS, Do CB, Batzoglou S. Evidence for intelligent (algorithm)
design. Genome Biol. 2006;7(7):322.

8: Levin BE. Central regulation of energy homeostasis intelligent design: how to
build the perfect survivor. Obesity (Silver Spring). 2006 Aug;14 Suppl
5:192S-196S.

9: Brill JV, Chapman FJ, Dahl J. The practice of gastroenterology: evolution
versus intelligent design. Gastrointest Endosc Clin N Am. 2006 Oct;16(4):623-41.

10: Mazzucchelli R, Durum SK. Interleukin-7 receptor expression: intelligent
design. Nat Rev Immunol. 2007 Feb;7(2):144-54.

11: Barber M, Njus D. Clicker evolution: seeking intelligent design. CBE Life Sci Educ. 2007 Spring;6(1):1-8.

12: Sisson JC, Avram AM, Rubello D, Gross MD. Radioiodine treatment of hyperthyroidism: fixed or calculated doses; intelligent design or science? Eur J Nucl Med Mol Imaging. 2007 Jul;34(7):1129-30.

13: van Isselt JW, de Klerk JM, Lips CJ. Radioiodine treatment of hyperthyroidism: fixed or calculated doses; intelligent design or science? Eur J Nucl Med Mol Imaging. 2007 Nov;34(11):1883-4.

14: Libourel IG, Shachar-Hill Y. Metabolic flux analysis in plants: from intelligent design to rational engineering. Annu Rev Plant Biol. 2008;59:625-50.

15: Fagin JA. The Jeremiah Metzger Lecture: intelligent design of cancer therapy: trials and tribulations. Trans Am Clin Climatol Assoc. 2007;118:253-61.

16: Charifson PS, Grillot AL, Grossman TH, Parsons JD, Badia M, Bellon S,
Deininger DD, Drumm JE, Gross CH, LeTiran A, Liao Y, Mani N, Nicolau DP, Perola
E, Ronkin S, Shannon D, Swenson LL, Tang Q, Tessier PR, Tian SK, Trudeau M, Wang
T, Wei Y, Zhang H, Stamos D. Novel dual-targeting benzimidazole urea inhibitors
of DNA gyrase and topoisomerase IV possessing potent antibacterial activity:
intelligent design and evolution through the judicious use of structure-guided
design and structure-activity relationships. J Med Chem. 2008 Sep 11;51(17):5243-63.

17: Kazlauskas R, Lutz S. Engineering enzymes by 'intelligent' design. Curr Opin
Chem Biol. 2009 Feb;13(1):1-2.

18: Ottolino-Perry K, Diallo JS, Lichty BD, Bell JC, Andrea McCart J. Intelligent design: combination therapy with oncolytic viruses. Mol Ther. 2010 Feb;18(2):251-63.

By Darja Kanduc (not verified) on 20 Feb 2010 #permalink