Oh, boy. The Intelligent Design creationists are all excited about a new paper that purports to have identified an intelligent signal in the genetic code.

Here's a new paper that can be added to the growing stack of intelligent-design articles in peer-reviewed journals. Even though the authors do not use the phrase "intelligent design," their reasoning centers on the detection of an intelligent signal embedded in the genetic code -- a mathematical and semantic message that cannot be accounted for by a natural cause, "be it Darwinian, Lamarckian," chemical affinities or energetics, or any other.

I've read the paper by ShCherbak and Makukov, and by golly, the Discovery Institute flack really has accurately summarized the paper: it does explicitly and clearly claim to have identified evidence of design in the genetic code! That's newsworthy in itself, that the creationists can accurately summarize a scientific paper…as long as the results conform to their ideological expectations.

Unfortunately, what they've so honestly described is good old honest garbage.

Here's the short summary of what they do: they jigger the identities of the amino acids coded for by each codon into a number, a *nucleon sum*. What is that, you might ask? It's determined by adding up the number of protons and neutrons in the amino acid, which is simply the mass number of the compound. Further, you can distinguish the amino acid into it's R group, and the atoms that make up the peptide chain proper, which he calls the B group, for standard block. The mass number of the B group is always 74, except for proline, so he transfers a hydrogen from the R group to the proline B group to bring it up to 74, and by the way, did you notice that 74 is two times 37, which is a prime number? Now if you take all the three-digit decimals with identical digits (111, 222, 333…999), and sum their digits (111=3, 222=6, 333=9, etc.) you get the quotient of the number divided by…*37*!!!1!!

Are you impressed yet? This is simply numerology, juggling highly derived quantities that have little to do with functional properties of the molecules to come up with arbitrary numerical relationships, and then claiming that they're somehow significant. They also play games with the sums of the mass numbers of just the R groups for certain codons, adding or subtracting the B number, finagling things until they get numbers that are evenly divisible by their magic prime number of 37, etc. It's pure nonsense through and through.

But every once in a while, something sensible emerges out of the murk. Here's the logic of their argument:

To be considered unambiguously as an intelligent signal, any patterns in the code must satisfy the following two criteria: (1) they must be highly significant statistically and (2) not only must they possess intelligent-like features, but they should be inconsistent in principle with any natural process, be it Darwinian or Lamarckian evolution, driven by amino acid biosynthesis, genomic changes, affinities between (anti)codons and amino acids, selection for the increased diversity of proteins, energetics of codon-anticodon interactions, or various pre-translational mechanisms.

(1) is simply saying that there must be a pattern of some sort — if the code were purely random assignment of arbitrary nucleotides to each amino acid, it wouldn't be much of a sign — it would suggest that the sequence is noise, not signal. (2) is the really hard part, the one where you'd have to do a lot of work: you'd have to show that natural processes did not contribute to the pattern. They do not do that. They can't do that. They take a different and curious tack.

They literally argue that because organizing the code by their nucleon sums makes no sense and has no reasonable functional consequences…therefore it must be an artificial and intentional feature. I've heard this argument before. It's called the Chewbacca defense. Ladies and gentlemen, think about it: *that does not make sense*! If nucleon numbers show a mathematical pattern of any kind in their relationship to codons, you must accept the existence of a designer.

However, if we can show a natural property that leads to the organization of the genetic code, then I'm afraid their argument evaporates. Even more so than building an argument on the Chewbacca defense, that is.

There's a very good discussion of the genetic code in Nick Lane's book, *Life Ascending: The Ten Great Inventions of Evolution*, and I'll briefly summarize it.

First, there is a pattern to the genetic code! No one has ever denied that; it's obviously not the case that amino acids are randomly assigned to trios of nucleotides. Here's the code:

Let's look at one amino acid, glycine (Gly), down in the bottom right corner. The genetic code is degenerate: that means that most amino acids have multiple combinations of nucleotides that can specify them. Glycine's codes are GGU, GGC, GGA, and GGG. Do you see a pattern? The code is actually GG_, where the third position has a lot of slack or wobble, and any nucleotide will do. We see similar cases where just the first two nucleotides are sufficient to specify leucine, valine, serine, proline, threonine, alanine, and arginine. Even with the other amino acids, there are some constraints; CA_ can identify histidine or glutamine, but if the third letter is a pyrimidine (U or C), you get histidine, while if it's a purine (A or G), you get glutamine. There are patterns all over the place here! So of course ShCherbak and Makukov could find evidence of significant organization.

But there's more. There are other rules associated with this pattern.

In the synthesis of these amino acids, biochemistry typically modifies a raw starting material. The first letter of the codon says something about the biosynthesis of the associated amino acid.

If the first letter is:

•C, then the amino acid is derived from alpha-ketoglutarate.

•A, then the amino acid is derived from oxaloacetate.

•T, then the amino acid is derived from pyruvate.

•G, then the amino acid is derived in a single step from simple precursors.

The second letter of the codon is correlated with chemical properties of the amino acid.

If the second letter is:

•A, then the amino acid is hydrophilic.

•T, then the amino acid is hydrophobic.

•G or C, the amino acid has an intermediate hydrophobicity.

Wait…so there's a *pattern* to the genetic code, and that pattern is associated with the *physical properties* of the amino acids? Why, that makes sense. Chewbacca is routed! The most likely origin of the code lies in likely catalytic properties of dinucleotides; pairs of nucleotides in ancient organisms were initially functioning as proto-enzymes before they were incorporated into strings of coding information. At least that provides a historical physico-chemical route to the particular code we now have that does not require weird numerological masturbation.

It's rather pathetic that the Discovery Institute thinks this is a beautiful piece of science. It's not. It's nonsense. But look how the DI spins this story:

How will evolutionists respond to this paper? It's hard to see how they could dismiss it. Maybe they will try to mock it as old Arabian numerology, or religiously inspired (since Kazakhstan, which funded the study, is 70% Muslim). Those would be unfair criticisms. The authors have Russian names, certified doctorates, and wrote in collaboration with leading lights in the West. Or perhaps critics could argue that the authors hail from a foreign country whose name has too many adjacent consonants in it to take them seriously.

No, it appears the only way out for Darwinists would be the "Dawkins Dodge." You may remember that one from the documentary Expelled, where Dawkins admits the possibility of panspermia for Earth, so long as the designers themselves evolved by a Darwinian process.

What's most notable about this paper is the similarity in design reasoning between the authors and the more familiar advocates of intelligent design theory. No appeals to religion or religious texts; no identifying the designer; just logical reasoning from effect to sufficient cause. The authors even applied the "design filter" by considering chance and natural law, including natural selection, before inferring design.

If Darwinists want to go on equating intelligent design with creationism, they will now have to take on the very secular journal Icarus.

I didn't even consider the religious or ethnic basis of this study; it didn't come to mind at all. It is clearly simple stupid numerology, though. Look at the rationale given for all of the conclusions, which consist entirely of mathematical manipulations of arbitrary derived properties of the molecules, to arrive at a claim of prime number significance.

We certainly don't need to invoke panspermia. Nothing in the genetic code *requires* design. and the authors haven't demonstrated otherwise.

I am most amused by the cute parallelism of claiming surprise that the authors of this paper use "design reasoning" similar to that used by American Intelligent Design creationists. They've been slinging this slop for decades; why be impressed that another set of Intelligent Design creationists in Kazakhstan are using the same tired tropes?

I'm also not impressed with the failure of implementation of their logic. OK, they have a 'design filter' that they apply, but so what? Their methods *failed* to recognize a well-known functional association in the genetic code; they did not rule out the operation of natural law before rushing to falsely infer design.

And that last bit…I don't care what journal it was published in. The prestige of a journal does not confer infallibility, and even the best of journals will occasionally publish crap. They will be especially likely to publish garbage when they stretch beyond the expertise of their reviewers. Icarus is a journal of planetary science that publishes primarily on astronomy and geology. This particular paper conveniently falls between the cracks — it's a weird paper full of trivial arithmetical manipulations for arcane purposes with no scientific justification for any of its procedures. I don't know how it got accepted for publication, other than by boring the reviewers with its incomprehensible digit fiddling.

One last thing: don't rush to claim a secular purpose behind this work. It's already been appropriated by freaky strange religious fanatics and lovers of the bible codes. You can't blame shCherbak directly for this weirdo's interpretations, but certainly he isn't far from his temperament.

The facts presented on this site, when combined with those now revealed to us by shCherbak, constitute invincible evidence of the truth of the Judeo-Christian Scriptures, and of the Being and Sovereignty of their Divine Author.

Yeah, numerology. Nothing but wanking over tables.

- Log in to post comments

Hi, I've regularly read your blog, so I am pleased to see our paper discussed here :)

But I would be even more pleased, if your criticism were more robust than just referring to old stupid numerology. Of course, if we are working within the framework of SETI, then we should deal with arithmetical language as the commonly accepted method for (at least first) communication. Why then not calling the Arecibo message a numerological stuff? And, if you didn't know that the difference between number theory and numerology is the same as between astronomy and astrology. Mentioning numerology, you forget to mention most important results of our paper, that all arithmetical patterns are of the same type and the constitute a set which is algebraically closed (see Appendix E in our paper). You are impressed with numerology, but not impressed with the fact that knowing those patterns is equivalent to knowing the mapping of the code itself? Damn good numerology :)

And, by the way, we have no any relation to Intelligent Design, we just test the hypothesis of Crick and Orgel. And we discussed the manuscript with several specialists on the genetic code (see acknowledgements in the paper). They provided a much more robust criticism (because they found time to read the paper carefully before drawing conclusions), but none of them found the result to be "old stupid numerology". If it was, it would be hardly published in Icarus.

Sorry for typos, I have no option to edit the post.

I would be impressed if the number was 42.

Always seems weird nut job religious extremists call people who accept the fact of evolution as Darwinists.

Because it tries to disprove the hypothesis that there are designers (of radio messages) out there.

Your study tried to disprove the hypothesis that there's at least one designer of the genetic code somewhere, and it failed. So far, so good. But:

– Why bother coming up with such an unparsimonious hypothesis in the first place? Read the post again to see how unparsimonious it is. There are several papers out there on how the genetic code probably evolved.

– Your arithmetic juggling seems as random, as irreproducible a set of choices, as numerology, so PZ decided to say so.

If you actually want to discuss this with PZ, keep in mind that he almost never reads the comments on this version of Pharyngula. The full version is over at Freethoughtblogs.com; bizarrely, the spam filter here doesn't let me link to it directly!

And finally... have you

no shameto use the Chewbacca defense? :-D***There are several papers out there on how the genetic code probably evolved.***

I guess there are not just several papers, but several hundreds of papers on that topic. I've read most of them, and we actually take into account the most popular speculations on the code evolution in the statistical test.

***If you actually want to discuss this with PZ***

Actually, I don't feel like discussing it with PZ. He is doing a good job fighting with ID freaks, but unfortunately, he will dismiss anything that sounds to him to be ID-like right away without going into details.

***And finally… have you no shame to use the Chewbacca defense?***

No, because there is no such defense. What we use is Occam Razor rather than Chewbacca defence. It is much simpler to adopt Crick's and Orgel's hypothesis to explain the ensemble of precision same-style patterns which make up an algebraically defined set, than to refer to unknown mechanisms that could produce such ensemble based on stochastic models.

***Your arithmetic juggling seems as random***

You just didn't read the paper. The very point is that the arrangements considered have minimum arbitrariness in them. If one would consider all random arrangements, I guess he would find much more patterns ;)

I don't have access to the full paper right now, as my library doesn't allow me to access articles that aren't yet in print, but I have seen the figures and associated descriptions. I have particular issue with this one:

"Fig. 3. Digital symmetry of decimals divisible by 037. Leading zero emphasizes its equal participation in the symmetry. All three-digit decimals with identical digits 111, …, 999 are divisible by 037. The sum of three identical digits gives the quotient of the number divided by 037. Analogous sum for numbers with unique digits gives the central quotient in the column. Digits in these numbers are interconnected with cyclic permutations that are mirror symmetrical in neighbor columns. Addition instead of division provides an efficient way to perform checksums (see Appendix C). The scheme extends to decimals with more than three digits, if they are represented as a + 999×n, where n is the quotient of the number divided by 999 and a is the remainder, to which the same symmetry then applies (for three-digit decimals n = 0). Numbers divisible by 037 and larger than 999 will be shown in this way."

All of this nonsense can be broken down into one statement:

1+1+1=111/37

From this, the relationships from 222 to 999 can be derived by simple multiplication:

2(1+1+1)=2(111/37)

2+2+2=222/37

and so on.

So it isn't in any way surprising that all of the three-digit NNN decimals work this way. Why not the 4 digit ones? 5? What's so significant about the three-digit NNN pattern? This is numerology if I've ever seen it.

Here's a good question: why does this trend only come out in decimal notation? Did the intelligences that seeded Earth with life in the form of the genetic code intend on us developing 10 digits? The 37th digit would still be a prime in any base, but the NNN decimals would no longer carry the same (arbitrary) pattern.

I also see that you get into the quaternary system. However, you seem to be selectively choosing bases in which your symmetries conveniently appear.

What is known about the relationship between the third letter of the codon and the corresponding amino acid? I am pretty that something is going on: broadly, if the third letter is A or G, then the amino acid is more likely to be in a helix in a protein than if it is U or C. My data comes from table 1 in this paper:

http://www.jsbi.org/pdfs/journal1/GIW04/GIW04F019.pdf

The 'H' column in the table shows that Leu is more likely to be in a helix than Phe, Trp is more likely to be in a helix than Cys, and so on. The Cys/Trp and Ile/Met cases are a bit funny, but they still follow the pattern as well they can. But I don't know what properties an amino acid needs to make it suitable for making helices.

Genome Informatics 15(2): 181–190 (2004) 181

Predicting Protein Secondary Structure by a Support

Vector Machine Based on a New Coding Scheme

Long-Hui, Wang Juan Liu

Yan-Fu Li, Huai-Bei Zhou

2 Neil J

***So it isn’t in any way surprising that all of the three-digit NNN decimals work this way.***

Absolutely. It is just arithmetic and I don't really see what is the problem with that. This is just a description of a particular divisibility criterion that we provide for non-mathematicians. This is a feature of decimals divisible by 037, without any connection to the patterns in the code that we describe.

***Why not the 4 digit ones? 5? ***

I guess that 4- or 5-digit decimals might have other divisibility criterion, we do not consider them because we do not see them in the nucleon sums of the genetic code. Why should we mention them?

***This is numerology if I’ve ever seen it.***

Oh, come on! There is nothing mystic about positional numeral systems and corresponding divisibility criteria. It is the very beginning of ordinary arithmetic.

***Here’s a good question: why does this trend only come out in decimal notation?***

Well, nucleon equalities hold true being written in any numeral system. It just turns out that they acquire conspicuously distinctive notation if written in positional decimal system. As for why - we give few possible explanations in the paper, and I think the most probable one is that standard blocks of alpha amino acids have 74 nucleons, i.e. twice 037. Therefore, the decimal system is just most convenient to use if one is going to embed a small arithmetical message into the code :)

***I also see that you get into the quaternary system. ***

No, we just show, as an extra info, that similar divisibility criterion for digital triplets exist not only in decimal system, but in all systems which meet the requirement (q+1)/3 = Integer, where q is the radix. These are quaternary, septenary, decimal systems and so on.

Sorry for the typo. Not (q+1)/3 but (q-1)/3.

It's kind of sad, but in that entire article the thing that I focused on and reacted to the most was this...

"You may remember that one from the documentary Expelled, where Dawkins admits the possibility of panspermia for Earth, so long as the designers themselves evolved by a Darwinian process."

You fools. Stein asked him for any conceivable way in which a designer might be involved. He followed the premise. Then you quote-mined the hell out of him to make him sound like he think of panspermia as a valid hypothesis. Not to mention that "possibility" is a pretty low standard to begin with. It's possible that the sun won't rise in the morning due to some catastrophic gravitational event or something. But is it LIKELY that the sun won't rise in the east tomorrow?! "Possibility" gets you nowhere.

As long as we have religion-sapped brains there will always be an attempt to seek a code for a nonexistent creator. I wonder if there will ever be an attempt to descry a hidden code of a creator, other than human, in a pile of dreck.

I'll delve into this a little deeper when it comes to print, Maxim. However, I'm neither a mathematician nor a biologist (I'm a PhD student in solid-state physics) so I can't definitively say that I'll be able to address the paper in its entirety.

Neil said:

"I’ll delve into this a little deeper when it comes to print, Maxim. However, I’m neither a mathematician nor a biologist (I’m a PhD student in solid-state physics) so I can’t definitively say that I’ll be able to address the paper in its entirety."

I am a mathematician, and over the past 5 years I've been moving into mathematical biology. I hope to write something about the paper tomorrow. In the meantime I will say this: don't let the fact that you are not mathematician nor a biologist worry you. If aliens or dieties embedded the supposed message in our genetic code, they were not mathematicians or biologists either.

If you look through a pack of cards, you will be able to find coincidences and patterns. Perhaps all four queens occur together. Perhaps the last ten cards all correspond to prime numbers. The more ingenious you are in the way you look, and the more open you are to different kinds of patterns, the more you will see. The more work you put in, the more you can get out. More formally, the greater the information content of the extraction process, the greater the information content of the result.

Suppose someone decides to throw out the black cards, and just look for patterns among the red ones. It might seem they have just made one choice - red vs black. You could formalize this by saying they have put one bit of information into the extraction process. You would be wrong. They have actually made several choices: They've decided to throw out some cards. (Why? Surely it is more natural to look at them all together? And if we're allowed to throw some out, why not add cards according to some scheme? You could cut the red ones in half for example, to make two red cards for each old one.) Secondly, they've decided to use colour as the criterion for choosing which cards to chuck, when cards have other attributes that could just as well have been used.

It is not possible to evaluate things like this rigourously. You may feel that throwing out some cards is a fairly reasonable thing to do, while cutting some in half is highly contrived. You might then assign just a a couple of bits to a decision to throw, and quite a lot of bits to a decision to cut. Someone else might disagree with your numbers. At any rate, it is crucial to consider all the things you might reasonably have done but didn't, when evaluating a process for extracting patterns. Otherwise, you might unwittingly put more information into the extraction process than you get out as a 'message'.

Bear these considerations in mind when reading the following list of decisions which are used to extract meaning from the genetic code in the paper. There are two major phases: converting (codon,amino acid) pairs to numbers, then looking for patterns in the numbers.

Primary Phase: Making Numbers

1. Decide to throw out the codons and focus on just the amino acids.

2. Decide to map amino acids to numbers, in particular positive integers. Human mathematicians study many kinds of mathematical objects. Graphs, or groups, or sequences of numbers are other possibilities.

3. Decide to disregard the molecular structure of the amino acids and regard them as just an unordered bunch of atoms. There are many numbers that one could derive from the molecular structure using the elements and/or the types of chemical bonds present, or the underlying graph.

4. Decide to focus on the nucleus of the atoms, and ignore the numbers one might derive from the electron shells. (Surely it is a shame not to get a mention of quantum spin into a story like this. Everybody likes a bit of quantum spin.)

5. Decide to ignore the fundamental constituents of the nucleus (the quarks - at least as far we understand the nucleus) and instead focus on the protons and neutrons.

6. Decide to ignore the obvious positive integer related to an atom, that is, the atomic number, which is the number of protons, and also the number of electrons if the atom is not ionized, and instead decide to include the neutrons, despite the fact that this number varies with the isotope. We can choose the most common isotope, and call the result a 'nucleon sum'.

Secondary Phase: Spotting Patterns

7. There's all sorts of things one can do with numbers. Find their prime factors, interpret them as coefficients of a polynomial and look at the roots, interpret them as coefficients of a continued fraction, add them up, subtract them, multiply or divide, compare them to physical or chemical constants, or to numbers that seem important from mathematics, and so on. But we might decide to divide them into two subsets and add them up to form two subsums. Whatever.

8. We've got 20 numbers and there are over a million ways to divide them up, but some ways seem more natural given the genetic code. But wait! We could make more than 20 numbers (or indeed less since some of the numbers are the same). But let's decide to make a few more. Not all amino acids are coded equally. Some might need special treatment. Met and Trp are special, because they are associated with only one codon. Leu, Ser, and Arg are the only ones where the first two letters of the codon can vary. Ser is extraspecial because its codons are not all connected by single substitutions. Ile is special because it is the only one with 3 codons. And Tyr, Cys and Trp are special because their codon shares the first two letters with a Stop codon. And Stop codons are very special. And so on. Anyway, lets decide to make a few more numbers by splitting the Leu, Ser, and Arg numbers into two, according to the first two letters in their codons.

9. There are about 8 million ways to divide the now 23 numbers into two sets. The codons associated with them suggest various more or less contrived possibilities.

Results

For some of these possibilities, it turns out that both subsums are equal and divisible by 37, and 37 is half the nucleon sum of the B groups, apart from proline.

Interpretation

We apologise for the anti-climax.

Graham, that was beautiful. And I might add, very well written, so well that this non-mathematiction could follow it. Kudos.

Since H, C, N, O, and S all have more than one stable isotope (over 1% of C is C13), the 'nucleon sum' of the amino acids is non-integer. So the paper by Makukov et. al. doesn't even reach the height of numerology. It's just another one of those notions that's so silly it isn't even wrong.

Graham, thank for clarifying the point of the paper ;)

Could you also explain please, why it so happens then that the standard genetic code harbors a set of patterns which form an algebraically defined set (which means that the code mapping itself is deduced from that set, and so the pattern set employs informational capacity of the code entirely), but all other known versions of the code do not find even a couple of same-style patterns, not to mention an algebraically defined set? Though I guess what your answer would be: we just looked not enough, yeah ;)

2 AnswersInGenitals.

The point with isotopes is discussed in the paper, you should read before criticizing. Don't you find it natural to consider only most stable and common isotopes if you take Crick's, Orgel's and Marx's hypothesis as the working one?

Hmmmm. Your post was on March 20, 2013. And you had the 21st post. So the numbers (in two digits) are 03 (for the month); 20; 20; 13 and 21. If I remove the repeating number, 20, I get left only with 13 and 21 and 3.

Add them together: 37!!! Coincidence? I think not!

What's more, and this is really exciting, is that the number of words, by line, in your post (at my screen's resolution) is: 9, 15, 16, 14, 18, 15, 11, 1, 15, 16 and 8.

Now if we look at those lines with more than 10 words, and consider the 1 to be a "add it" marker, so add the second digit, we get a total of 40 (that is, 5 + 6 + 4 + 8 + 5 + 1 + 5 + 6 = 40). However, we have 3 lines with less than 10 words, so if we take 3 from 40 we get.....

*****37!!!!****

If we take your name, and remove all repeating letters, we end up with x, i, u and o. These correpsond to number 24, 9, 21 and 15. ALL ARE DIVISIBLE BY 3!!! what are the chances of that, eh? and the only consonant, x, is the only even number. Also significant. That one must be the key.

So divided by 3, the numbers are 8, 3, 7 and 5. Added together we get 23.

Now look at x - that's the key, 24. The only even number. Added together we get 2+4 = 6. And multiplied it's 2*4 = 8.

So add our total of 23 to the "key" numbers: 23 + 6 + 8 is...

Oh my god. It's 37.

I'm getting TINGLES.

Magpie,

What you did has no meaning, simply because you have no initial hypothesis. And if you think that we jiggle with the numbers and arrangements just as arbitrarily as you did, you are wrong. Read the paper more carefully, and you will find that in each case we apply the same approach, and in each case we get the same result: nucleon equalities which turn out to have distinctive notation in the same numeral system and which are accompanied by the same base transformations.

I'm no mathematician, but you're just like, *cough Joseph cough* from talk.origins. Are you using "complexity science"?