The genome is not a computer program

A blowhard is still just a blowhard. As an engineer, I have worked with this sort of folk that thinks they know it all - about everything. I always wonder where they end up working, because we usually send them on their way. They are really no good as engineers either.

This guy seems to be a classic example. Fortunately, many engineers are scientists first, and if they have an interest in a subject they actually learn something about it.

The sad thing is, computer programming can be used as an analogy to DNA and evolution, but in a much different way than he describes. Someone please rip this apart if it's wrong, but I remember hearing that one of the big problems with Microsoft's operating systems for so long was that they were piggybacked onto old DOS code, and with each new version Microsoft would just try to put patches and overlays onto it rather than start from scratch, leading to weird quirks like the Y2K problem. Now that's a good analogy - evolution and bodies have to work within the constraints that are already there, so you end up with a lot of suboptimal parts and systems, and lots of extraneous parts that aren't useful anymore but are difficult to get rid of.

I'm a software engineer too, and wouldn't dream of correcting biologists, but at least I've kept up with some of the developments in mathematics and computing. Such developments, in the fields of chaos theory, fractals, genetic programming, etc, have demonstrated that emergent phenomena and vast complexity can be produced from the interaction of even simple components.

This guy is taking an algorithmic approach in his software design, and mistakenly assuming that biology works the same way.

Stimpson:

I think it's more probable that the human DNA which we have discovered so far doesn't contain all the information required to produce humans. I wouldn't be suprised if more DNA, or some other kind of information, is discovered some time in the future.

Just skimmed this thread. I was getting dumber for reading it.

Stimpson just proved that bumblebees can't fly.

The best demonstration of the role of the genome in specifying organisms is cloning. We can transfer a nucleus from a somatic cell into an egg and get a complete new clone. Dolly the sheep and all that.

Stimpson is just a time cube class crackpot. Nothing to see but some grins and giggles and move on.

He says that MS Word is 12MB. WTF?? Does he mean the average size of a Word document, or the size of the installed program itself or the installation files?? Assuming either of the latter two, he's waaaaaaaay off. You can now (purchase and) download Office 2007 from the MS website. The Standard package comes with Word, Excel, Powerpoint, and Outlook, and has a download size of 1.5GB - a "portion" will be removed after installation, if the download package is removed. Let's be very generous and say that 0.5GB will be removed, leaving 1GB for the four programs, for an average of 250MB per program. Even assuming that, say, Powerpoint takes up more space than Word, it's still at least an order of magnitude bigger than what he claims.

There, I've just proven him totally and completely wrong on that one point, and I didn't even pull any numbers out of my ass!

How do you REAL scientists find time to learn how to write software? I have been wondering this since I saw Richard Dawkins' "Growing Up In The Universe." I used to live with a programmer and it seemed like a bitch to learn how to program software.

By the way, anyone interested in Richard Dawkins' "Blind Watchmaker" program can Google it and can play with it yourselves. It's quite fun.

Perhaps if he insists on computing analogies he could take a look at Conway's Game of Life. Four simple rules + initial conditions = masses of complex (and often beautiful) structure. Although it's not a perfect analogy for epigenesis, it shows how the development of a 'unit' can depend on the structure of the units around it, and not just its own information (which in this case is a single bit).

Stimpson may have been getting some of his information from Richard Dawkins. Do you also think Dawkins is a nut for speaking about the genome in terms of bits and bytes?

http://www.skeptics.com.au/articles/dawkins.htm

DNA carries information in a very computer-like way, and we can measure the genome's capacity in bits too, if we wish. DNA doesn't use a binary code, but a quaternary one. Whereas the unit of information in the computer is a 1 or a 0, the unit in DNA can be T, A, C or G. If I tell you that a particular location in a DNA sequence is a T, how much information is conveyed from me to you? Begin by measuring the prior uncertainty. How many possibilities are open before the message "T" arrives? Four. How many possibilities remain after it has arrived? One. So you might think the information transferred is four bits, but actually it is two. Here's why (assuming that the four letters are equally probable, like the four suits in a pack of cards). Remember that Shannon's metric is concerned with the most economical way of conveying the message. Think of it as the number of yes/no questions that you'd have to ask in order to narrow down to certainty, from an initial uncertainty of four possibilities, assuming that you planned your questions in the most economical way. "Is the mystery letter before D in the alphabet?" No. That narrows it down to T or G, and now we need only one more question to clinch it. So, by this method of measuring, each "letter" of the DNA has an information capacity of 2 bits.

Dawkins goes on to discuss junk DNA and epigenetics as well, but he does hold to the idea that DNA is analogous to computer code. Does this make him a crank?

These selfish ignoramuses need to suck it up and realize how much harm they do whenever they open their big, flappy mouths and let words come out. :/ My father and I have switched places as I've grown up -- now I'm having to argue with him whether the things he's picked up are true. THANKS, STIMPSON. But you probably got your slap on the back.

I'm gonna go out and write a paper on how I think "Jesus" is an anagram for "cheese crackers".

PZ: I applaud you for breaking all this down, and yet...don't all of us teachers at the secondary school level have a terrible problem? A truth can be stated simply, and yet be impossible for students to understand. That the genome is not the source of all the information needed to build an organism is obviously true to those of us who've studied real living things for any length of time, but is utterly unfamiliar to others, even very well-educated people. Here the problem is not religious bias, but a failure to properly conceptualize the problem due to its subtlety and to the different ways that people use words like 'information' and 'code' in everyday life.

Now, these words are useful to us high school teachers, to put it mildly, in order to introduce the concepts, but almost inevitably their usage reinforces preexisting notions that do not match up well with the biological reality. So I have to admit to being a little stumped. Any suggestions from the peanut gallery?

I spent an agitated evening on Stimpson's blog some months ago... you know, the way one can sometimes get caught up in these online quagmires of stupidity.

I just thought I ought to point out - before the Salem Conjecture starts being thrown about willy-nilly - that Mr. Stimpson's grasp of academic computer science is about as thorough as his grasp of evolutionary biology.

I have a vague idea of how many bytes of code is needed to make complex software programs.

I'm a biologist and I have a pretty good idea of how few misconceptions it takes to make evolution seem impossible.

>How do you REAL scientists find time to learn how to write software? I have been wondering this since I saw Richard Dawkins' "Growing Up In The Universe." I used to live with a programmer and it seemed like a bitch to learn how to program software.

Are you kidding me? Find the time? It's part of the job for many. There are scores of fields in the numerous disciplines of the biological sciences where knowledge about writing computer programs is an advantage and a necessity. This is especially true in ecology, population genetics, conservation biology, and theoretical models of behavioral ecology that's on the forefront of evolutionary theory.

M$ doesn't make the software to do the specialized tasks we need, we do it ourselves. From writing programs to perform specialized statistical analyses (e.g., bootstraps and randomization analysis) to home range calculations of free-ranging animals to genetic algorithms and dynamic programming to sophisticated image analysis (e.g., GIS applications).

There are many, many professional biologists that know their way around C, Pascal and Fortran.

PZ,

I agree with you that the genome and a computer programs are two very different things. How difficult it would be to model the evolutionary process using a computer program?

If it is so difficult for us to write a program that models random mutation and natural selection then how could the earliest forms of life do this just by chance?

You can compare the genome as a data set to the data stored by a computer, and that's legit -- we know how much storage space you need to pack away the sequence. It is not in any way comparable to software.

It is certainly meaningful to say that the human genome contains X megabytes of data, but that misses the point entirely. One dataset may be larger than another, but that doesn't mean it is more complex.

"Complexity" is a very slippery thing, which is enormously difficult to define in a rigid manner. For example, many fractals are fantastically intricate, and yet can be generated by a few lines of code. Therefore, are they very complex or very simple?

Another issue is the "cleverness" (for want of a better word) of how the system utilises available resources. For example, these beautiful animations were all created using only 512 bytes of code, whilst far cruder animations may be a million times larger.

The human genome may be simple in terms of information theory, but in terms of "cleverness" it is second to none (with the possible exception of some viral genomes, which are so compact as to almost defy belief). The laws of physics allow for amazing subtlety, and evolution has taken full advantage of this fact.

His whole conclusion is a beautiful example of garbage in, garbage out.

And yet you say that programming concepts don't apply in biology!

My jaw hit the floor when he jumped from `coordinate activity among N objects' to `algorithms to sort N objects'. There's no need to sort the objects you're coordinating even if you *do* have a central coordinating agency, and what on earth he considers might `sort' cells (using a non-parallelized sort!) I have not the least idea.

I take it Stimpson has never even played with some of the classic toys of computing: Game of Life and Mandelbrot generators. I thought everyone wrote themselves one or both of those, just for fun, at some point (at least, I did). There's some fantastically complex behaviour in there, but it can be produced by a few kilobytes of code (ignoring all the overhead the linker has to add to make it play properly with the o/s).

I'm a biologist and I have a pretty good idea of how few misconceptions it takes to make evolution seem impossible.

Well, as a graduate student, I say you're totally wrong. To get that level of ignorance, you'd need at least 37 misconceptions, and we all know that 37 is much more than the technical definition of "few." This is like saying that someone could be unable to save a document file that's under 1 MB, and we all know that's absurd.

I will never understand how these kooks make writing gibberish look so easy. I always have trouble. :)

You guys don't take prisoners, do you? Somebody says something you don't like and it's time to fling vitriol and dung. I can hear baa! baa! as the flock follows its shepherd.

I remember doing similar sums in about 1980 working out that the entire ACGT encoding of a set of human chromosomes would fit on 1 or 2 (I don't remember!) standard 1600bpi magnetic tapes (it was that long ago, before CDs) and I found that an interesting number.

A single egg cell can produce a full-size organism, so this amount of DNA plus existing cellular machinery plus environment is all an organism needs. And given that all primary protein sequences are specified by DNA, and (almost) all of a cell's machinery is based on proteinaceous enzymes, it tells you about the amount of information encoding the bulk of the primary biochemical machinery.

Personally, I find myself amazed that so little information can specify organisms with terrific regularity and complexity. Where's your sense of wonder? Isn't it even amazing that our developmental systems are so finely tuned that we end up with two arms (with the requisite number of fingers) of similar length even though they are separated by huge distances relative to our cellular architecture?

Isn't it terrific too that we have about 6 feet of DNA in each cell, but this can reliably itself and separate and duplicate during mitosis?

It's a mistake to lose one's sense of wonder. And being gobsmacked by nature's complexity doesn't make one either an idiot or a closet theist.

Look at the bootstrapping process of a computer. Small programs have "just enough" code to launch the next program, which has slightly more capability. A DNA strand doesn't magically turn into a person- I can't throw DNA drops onto the pavement, water them, and grow a human. There are some necessary bootstrap conditions. In essence, DNA is not so much a "program" as it is a piece of information. This is a false analogy. Unfortunately, this person has used an invalid comparison to reach a false conclusion.

Well, I'm not a geneticist or a molecular biologist or a softwear engineer, but I AM a Mom, and I have baked a cake. And I've never put all the ingredients together and had it turn into a cat or a dog or anything but a cake. Bears turning into whales? Fish sprouting wings and flying up into the trees? Oranges giving birth to people?

Betty Crocker says "no."

hyperdeath: The human genome may be simple in terms of information theory, but in terms of "cleverness" it is second to none (with the possible exception of some viral genomes, which are so compact as to almost defy belief).

Is this a rhetorical flourish, or has it actually been worked out by explicit study that human genetics exceeds that of all other metazoans in some specific measurement of efficiency?

Even compared to e. coli? to kitties? to squid?

Not much of a programmer, either. The fastest sort algorithm is order N. The spaghetti sort algorithm. Can't be implemented in common hardware, but it is order N.

Take the numbers to sort, and cut lengths of spaghetti corresponding to the numbers. Grab the spaghetti in your fist and smack it hard against a flat surface. Stand up the stack and pull out the stick that is longest (it sticks out the highest). Measure it, and repeat. Order N sort. It was described in the mathematical recreations column of SciAm many years ago, as the "spaghetti analogue gadget". It was noted that, because the analogue steps are rather slow, you need a very large set of numbers before this can out-perform an N.logN algorithm on modern (at the time) computers. I believe they remarked that the "flat surface" would be about the size of the moon, but work with enough numbers, and N beats N.logN.

This isn't just semantic sniping and gleeful nitpicking. Biology is an excellent example of analogue behaviour, so don't believe that the world of digital algorithms and von Neumann machines accurately parallels the behaviour of a biological system.

genome = genome maker

PZ says:

You can compare the genome as a data set to the data stored by a computer, and that's legit -- we know how much storage space you need to pack away the sequence. It is not in any way comparable to software.

From Chapter 5 of The Blind Watchmaker:

It is raining DNA outside. On the bank of the Oxford canal at the bottom of my garden is a large willow tree, and it is pumping downy seeds into the air. ... The whole performance, cotton wool, catkins, tree and all, is in aid of one thing and one thing only, the spreading of DNA around the countryside. Not just any DNA, but DNA whose coded characters spell out specific instructions for building willow trees that will shed a new generation of downy seeds. Those fluffy specks are, literally, spreading instructions for making themselves. They are there because their ancestors succeeded in doing the same. It is raining instructions out there; it's raining programs; it's raining tree-growing, fluff-spreading, algorithms. That is not a metaphor, it is the plain truth. It couldn't be any plainer if it were raining floppy discs.

If you're going to mock the guy for equating DNA with software then you're going to have to call Dawkins on it as well.

Maybe the term "computer code" is a source of confusion for Derek James (not so much for Stimpson). The term is usually used to mean the executable instructions that the programmer provides to the CPU. This is what Stimpson is talking about.

The word "code" can also mean a method of storing data in binary format, as in, for example, the "American Standard Code for Information Interchange" (i.e., ASCII). This -- not ASCII specifically, but the encoding and storage of data in the form of binary digits -- is what Dawkins was talking about.

As the genome encodes data (mostly protein sequences), it's perfectly valid to compare DNA to the binary storage of data on computer media. To compare DNA to the number of bytes required to encode algorithms, on the other hand, is bogus.

Okay, I posted before I saw the follup from D.J. with the "It's raining ... algorithms" quote. Bummer.

Can someone give him the onion test? amoeba test? fugu test?

Oops, he did take the onion test - and flunked by handwaving:

I could speculate that perhaps the DNA of onions also serves a purpose for the animals that eat it -- after all, it is a food source

same with nutrition. Still needs to take the fugu test.

Somebody point the bozo to L-systems as an example of how to get complex structures from a simple set of instructions.

http://en.wikipedia.org/wiki/L-system

Now that's a good analogy - evolution and bodies have to work within the constraints that are already there, so you end up with a lot of suboptimal parts and systems, and lots of extraneous parts that aren't useful anymore but are difficult to get rid of.

Shorter version: Sure, God created the entire universe in only six days. But He didn't have an installed base.

Stipulated: Randy Stimpson is a stupid, clueless, pathetic, Creationist idiot.

However, the question that he raises is far more subtle and nuanced that PZ Myers herein allows.

I speak as someone with lesser Biology credentials that PZ (I have only about 25 publications and conference presentations in the field), but enough to be negotiating for a Research Scientist position in Biological Networks at Caltech.

On the other hand, I have 42 years of computer/software experience, significantly more than does PZ.

I strongly feel and have since before I began my Ph.D. dissertation research in 1975 (in what's now considered Nanotechnology, Artificial Life, Systems Biology, and Metabolomics) that there is a profound relationship between Genome/Proteome/Physiome and Source code/interpreted or compiled object code / effected change in embedded system or robot behavior or client-server interaction.

In my dissertation, I sometimes referred to "genocode" versus "phenocode." Several chapters of that dissertation have now been published in refereed venues.

The question: "what is the channel capacity of Evolution by Natural Selection" and the related question: "What is the Shannon information in an organism's genome" is a very hard question, which we have discussed in this blog and elsewhere. I have a draft paper of some 100-page length sitting on a NECSI wiki, triggered by a what I took to be a good question from an annoying Intelligent Design troll; said wiki paper draft online thanks to the dedicated work of the admirable Blake Stacy, for about a year, which I have not had a chance to complete, due to little distractions such as life-threatening medical condition, 9 days in hospital, and 6 weeks away from the classroom teaching that I love.

I think that there is common ground between the naive "DNA = computer" myth and PZ's very thoughtful description above, which I quite enjoy:

"... the genome is nothing like a program. The hard work of cellular activity is done via the chemistry of molecular interactions in the cytoplasm, and the genome is more like a crudely organized archive of components. It's probably (analogies are always dangerous) better to think of gene products as like small autonomous agents that carry out bits of chemistry in the economy of the cell. There is no central authority, no guiding plan. Order emerges in the interactions of these agents, not by an encoded program within the strands of DNA."

"I'd also add that the situation is very similar in multicellular organisms. Cells are also semi-independent automata that interact through a process called development in the absence of any kind of overriding blueprint. There is nothing in your genome that says anything comparable to 'make 5 fingers': cells tumble through coarsely predictable patterns of interactions during which that pattern emerges. '5-fingeredness' is not a program, it is not explicitly laid out anywhere in the genome, and it cannot be separated from the contingent chain of events involved in limb formation."

I should like to point out that Artificial Intelligence (my M.S. in 1975 was for work on the borderline between AI and Cybernetics), Agent-based software, and Quantum Computing have brought "program" into a new paradigm, as much as genomic and post-genomic research and data have brought DNA/RNA/Protein into such a new paradigm that the very word "gene" is difficult to properly define at any level of education.

As more evidence that this guy has his head up his ass, I give you something he wrote in March of last year on Entropy & Evolution that shows that he doesn't know much about either and which contradicts his later confused notions and he shows his ignorance of genetic algorithms and biology of aging:

Now since I am a software developer, mutation (development) and selection (testing) of complex systems is an everyday activity for me. So there are similarities between what I do for a living and the concept of evolution. This difference is that DNA is considerably more complex than software. Yet no one develops software by random mutation and testing alone. Instead of random mutation the software development process employs intelligent design. I don't believe that random mutation has any place in the software development process -- so why should I believe in evolution?

Off to the dustbin, non-Intelligent Designer.

It's not unreasonable to compare DNA code with computer code for purposes of simple analogy (Or at least it seems so to me having had some experience in the latter and none in the former). However certain differences are glaring. The most obvious one is the fact that nearly[1] all code is human designed and thus tells a sort of 'story'. You can look at various bits of it and realize what they are for. Find the modules involved and so forth. It's obviously designed. It certainly doesn't have heaps of nonsensical scribbles which don't accomplish anything at all. It doesn't have a sort routine repurposed as a screen refresh algorithm by way of being a tic-tac-toe playing AI and a pseudorandom number generator.

Further, a major difference is the platform itself. Computers are, by and large, synchronous, discrete and deterministic. They execute algorithms and operate with precisely quantized data. DNA 'executes' (and the term is a poor one) in a environment that couldn't be more different. The most glaring difference is that there is no CPU analogue during the development of an organism. No central authority. Instead all control is emergent.

About the size of Word (12 MB ?!), well, I'm assuming he measured the size of the executable itself. This is rather silly seeing as he didn't take into consideration the untold megabytes of shared libraries linked to it or a incredibly complex operating system which is required to turn the abstract request for services into actual operations on the hardware which, in turn, has its own abstractions and inernal software and so on. A lot more information goes into getting that Word screen with the flashing cursor than a piddly 12 MB.

[1] But there are exceptions. For instance there is a pathological programing language named Malbolge that's deliberately so perverse it's impossible to write even a Hello World program. One has been generated, however, using a genetic algorithm. No-one knows how it works. So it looks designed (because it does something) but isn't. Much like some other things I could name. You can find more on this on Good Math/Bad Math. I'd link to it but then the spam filter might eat me.

I'm surprised that nobody's pointed out this horrendous leap in logic already:

Since the most efficient algorithms to just sort n objects have an order of nlog(n) I am tempted to guesstimate by multiplying 22MB by log(210) to get a lower bound.

When we say that comparison sort algorithms are "order of nlog(n)", we're talking about the time necessary to execute them, not the total size of the sorting instructions themselves. When we go from sorting an array with 1,000 elements to one with 1,000,000 elements, that doesn't mean we need a larger program to handle it!

Besides that, there are (non-comparison) sort algorithms that are faster than n log(n), such as the spaghetti sort described by #24, and many of them can be implemented in software. Apparently Stimpson has never heard of bucket sort, counting sort, pigeonhole sort, etc.

These cell types share a lot of common features so I'll assume there is a lot of common information. Just how much of the information is shared between these cell types is a guess. I am going to assume that 90% of the information in each cell type is shared and 10% is unique.

I work in a lab that's trying to pick out programs* of gene expression which enable the development of different types of projection neurons in the forebrain. From our data, only about 0.5% of the genes expressed in one type of projection neuron are expressed uniquely to that type. The vast majority of genes expressed in neurons of any kind are shared, because all types of neurons need to do many of the same things.

And of course, even the 0.5% of genes expressed "uniquely" in different forebrain projection neuron types are not expressed only in neurons -- many of the same genetic programs used to pattern the brain are also used in the development of other organs and systems. A gene that I'm particularly interested in is involved in generating two different forebrain cell types, as well as T cells in the immune system.

(*Of course, I'm not using "programs" here the way Stimpson is. When we talk about programs in our lab, we just mean combinations of genes that result in the development of a certain cell type. So far we have described three genes that are critical in combination for the formation of corticospinal motor neurons, which sit in layer 5 of the cerebral cortex and send an axon to the spinal cord.)

God, what an imbecile. Not only does he not know anything about biology, his approach to programming is not particularly bright.

Complex programs require a lot of code, but that's because there is a lot of functionalities build into them; each part of the code, is usually pretty simple - just the right sequence of variable declarations, for-loops, if-statements etc.

And if you know how to code correctly (something I certainly don't take for granted that the moron can), you are able to reuse the same code several places, by referring to it (either as functions or as objects, depending on your programming language).

Anyway, it doesn't matter, as PZ says, there is nothing alike in program code and the genome.

But what do I know? It's not like I make a living in software development.... oh, wait...

Yet no one develops software by random mutation and testing alone.

Add genetic programming to the list of computer science topics he's clueless about.

#24: Biology is an excellent example of analogue behaviour Hear, hear (and written like a true Brit, with silent "u"'s twice over) Biological dynamics are indeed analog, only metaphorically comparable to digital systems. The same goes for psychological dynamics, which computers model only as poetry. For example, here's a dynamic with a difference: digital memory is forever and is nearly perfect; animal memories fade with time and vary in accuracy. In humans it's well studied that most of one's memories aren't of the original event as such but are memories of memories - that is, one encodes the event and then remembers the encoding, more or less accurately (more less than more) over some of one's life. Digital memory is only a metaphor for analog memory.

#22: Betty Crocker says no. But she should agree that DNA works as a recipe rather than as a blueprint. That is, DNA specifies a procedure which usually works given ingredients and a mostly reliable environment - the bun will come out of the oven 8^) This is very different than an architect's or engineer's blueprint, which specifies every detail of the final product. This difference may suffice to explain the fact that much more DNA is commonly found in plants than in animals. Plant DNA has to specify in advance the many chemical/protein components which are the plant's innate and only way of responding to its world (saps, toxins, et al), while animal DNA provides a recipe for a brain which responds behaviorally to the world in less-specified ways based on moving about.

there is something like an irony involved in the fact that most creationist morons are right-wingers, staunch republicans, and yet their incomprehension of biological development shows that at heart they really believe in the communist central planning model.

"all order must come from on high!
larger systems need larger, more centralized governments, with more laws and regulations!
how can you say that the u.s. economy moves 13 trillion dollars every year without being able to point to a commissar of plumbing fixtures? impossible!
in fact, i calculate that there are at least 210 separate commissars for centralized production needed to coordinate the separate spheres of production--plus a supreme soviet to organize all of them!"

the idea of autonomous entrepreneurial agents pursuing their own agendas, each with a fragmentary and incomplete knowledge at their disposal, and yet creating a vast system of incredible complexity? baby, that's the american way.

and long before that, it was the evolutionary way.

celebrate it: evolution is american. creationism is communist.

Carlie wrote:

The sad thing is, computer programming can be used as an analogy to DNA and evolution, but in a much different way than he describes.

Exactly, it is an analogy, nothing more.

Historically, there seems to be a strong temptation to model biological and physical processes in terms of comtemporary technology. For Newton - and Paley - it was clockwork. Later it was telephone exchanges and now it is computers.

But an analogy only holds to the extent that the two cases being compared are similar. Yes, the willow tree seed plainly contains something like 'information' or a 'set of instructions' or a 'recipe' or 'software' for making another willow tree; but being "something like" does not mean 'the same as'.

To put an analogy into perspective, it is necessary to be just as aware of the differences as the similarities. To argue that it might as well be raining floppy disks as seeds is to miss the essential point that it is seeds the tree produces, not floppy disks.

DNA, viewed as a storage medium, is very different from magnetic or optical disks. The way 'information' gets into the DNA, the way that 'information' is used to make another willow tree, indeed the very nature of what we are calling 'information' in a genome is far from being the same as what is whizzing around inside a computer.

Computer geeks are so dazzled and enthralled by the toys they play with that they almost completely ignore what is different about biological systems. Yes, living things can be modelled in terms of computers to some extent but the key lies in the differences. Computing can provide some insights but, ultimately, living things must be understood as something different, something that stands in its own right and something that must be understood in its own terms.

Christopher (#24), or Ian (#36):

Surely looking at the spaghetti and taking the longest piece isn't an O(1) operation? I imagine picking the longest from a pasta-covered wall the size of the moon would take significantly longer than doing so with a mere handful.

"That just isn't possible: UNLESS there is a giant outside source of energy supplying the Earth with huge amounts of energy."

How the fuck can someone with enough logical sense to code write something like this AND totally miss it? Oh yeah, willful ignorance.

Ok, scratch that. I misunderstood the algorithm; of course it's O(n).

Any suggestions from the peanut gallery?

I'm not sure for which part you want a suggestion. The idea that the external environment plays a part in how the "code" turns out can be seen even in doing a jigsaw. A flat desktop type location (in a gravity field) is assumed by the typical jigsaw. Without one, the pieces tip over, fall down or tumble around rather than remain held together in tight formation with limitations on their assemblage.

For computer programs, it's rather important that the code find itself in the correct operating system environment - or it won't run at all. It gains from external sub-routines at its interfaces (monitor, keyboard etc). Even compiler options on a local machine, as well as cross-compilers, can significantly affect the executable program "life-form" one gets from the same basic piece of code. The presence of a virus on the compiling or executing machine can cause even more havoc.

Maybe if he thought of DNA as code tied directly to the timing of an inherently multi-threaded, chaotic CPU called reality, the difficulties in simulating it would be more apparent.

As for code that copies and rewrites itself as it goes along, a good model could be entries for Core War. If not educational for arrogant software engineers, its fun...

Yet no one develops software by random mutation and testing alone. Instead of random mutation the software development process employs intelligent design. I don't believe that random mutation has any place in the software development process -- so why should I believe in evolution?

Is it just me or did he answer his own question?

He says no intelligent designer would employ random mutation and testing alone. Yet random mutation and testing - in the sense of being tested by the environment for being fitted for survival - is what is observed. Thus, there is no intelligent designer and he should "believe in evolution".

kid bitzer #41 wrote:

celebrate it: evolution is american. creationism is communist.

Not so much communism, as Monarchy. Divine Monarchy. Remember, this is pretty much the same crowd which insists that the concept of constitutional democracy would be impossible if there wasn't a King of the Universe and Lord of Lords to whom we owe our absolute obedience -- and He orders us to create a constitutional democracy, to demonstrate our recognition that He created us all equally beneath his unquestionable authority.

Apparently Cranes only succeed in forming structures created from the bottom-up if they are first were levitated into place and commanded to do so by the Sky-Hooks.

Holydust, I actually appreciate that these people are opening their mouths. I invariably get more information about biology and how to think about it properly whenever they do. This is not the first time that I've heard the 'DNA is the software of life' analogy before.

It is unfortunate that they don't seem to get as much out of this as I do. :)

Brando, actually the ball of energy was someone else.

Brando, actually the ball of energy was someone else. In fact, Stimpson is aware of the sun. He has his own view on why the entropy argument is valid. Which is depressing. With luck however, Mark will pick it up.

"Biology is an excellent example of analogue behaviour" Hear, hear (and written like a true Brit, with silent "u"'s twice over) Biological dynamics are indeed analog, only metaphorically comparable to digital systems.

No, some biological dynamics like inheritance are digital. This is the way out of the blending inheritance dilemma.

Regarding those LINEs and SINEs: I'm not all that up on "junk DNA," though I gather that term itself is a problem...? Anyway, it's worth noting that thanks to the "cleverness" of evolution, as cited by hyperdeath in #15, I wouldn't be shocked to learn it's found at least some minimal use for their presence.

Hrmm. On #24, that "von Neumann" should have been "Turing". Silly brain misfire.

thank you for this

>> It's probably (analogies are always dangerous) better to think of gene products as like small autonomous agents that carry out bits of chemistry in the economy of the cell. There is no central authority, no guiding plan. Order emerges in the interactions of these agents, not by an encoded program within the strands of DNA. <<

I would suggest that is very much the process of all of the biological complexity we see. It is a humbling concept to be sure and very clearly stated. We ourselves and our own self awareness can be seen as having arisen from the interaction of just these "autonomous agents" and not a "supreme Being" with a plan for our "immortal soul"
I would suggest it is this growing understanding of biology that is fueling the backlash of all fundamentalists around the world. That and the ease of communication especially digital communication, the internet, making it easy for anyone with computer access to say anything to anyone anywhere.

Even if DNA can be said to be analogous to computer code (and I don't think it can), the comparison of size is invalid.

First, an executable (like Microsoft Word) is not just composed of program code, but also data. Thus, Stimpson is not actually comparing program sizes, at all.

Second, computer programs are generally written and compiled to optimize performance and maintainability, not space. Loop unrolling and function inlining are two examples of optimizations that increase program size to achieve better performance. Programmers may introduce layers of abstraction to achieve better maintainability of the code at the cost of both performance and program size.

Third, and probably most important, the target architecture plays as much a part in the program size as the program itself does. Some architectures have lots of instructions that make it possible to express complex operations in very little code, while others will require more code to perform the same operations. Whether or not DNA is comparable to computer code, it is absolutely certain that the mechanisms of gene expression are not at all like a von Neumann architecture, and certainly nothing at all like a modern Intel CPU.

Wait, so you're telling me the sea monkeys I'm constantly putting into my gas tank don't do anything?!

I may be biased by my own occupation and predilections (I am a professional software developer and read evolutionary biology for fun), and by my own sources of information, e.g. (or perhaps even i.e.) Dawkins, who claims that the correspondence between DNA and computer code is literal. Obviously, Dawkins is fallible like the rest of us, but I do think he's someone whose words we should consider.

I also don't think that the originally cited creationist is necessarily stupid (from this alone, anyway), so much as seduced by what Dawkins calls "bad poetic science" (see Unweaving the Rainbow), or perhaps taking the analogy too literally. I personally think that (computers being general computation devices) it is entirely reasonable to suggest that DNA may be explained in terms of what computers can do. That does not make it reasonable, or even sensical, to suggest that it works as our computers do.

Someone with knowledge of software development and biology might suggest (or not: I don't know biology) that it is a good explanation, but that counting bytes of hopelessly naïve. Not only do we have the "How much overlap?" question, with the creationist suggesting 10% unique gene expression as a wild but "generous" guess, and a commenter citing 0.5% with claims of actual supporting data; this also overlooks the possibility (which I gather is not merely hypothetical) that the same stretches of genome may be used to code more than one thing depending on where in a particular sequence the "read head" is placed. This offers the possibility of more compact coding by an order of magnitude (and has subsequently been done, to verify the possibility, with software). We might also well consider the genome a compacted or compressed length of code (it does not, after all, have to "execute" very quickly; the human DNA program has nine months to bootstrap itself). We also haven't spoken of the intrinsic instruction set, as the four bit-analogue values in DNA have intrinsic properties, unlike the computer's data that are merely read.

It seems to me that there are some more calculations to be done before we can decide either whether the computation analogue truly needs to be thrown out the door, or how much (decompressed, linearised) information our genome can (from this point of view) be considered to encode.

I don't think the computer program analogy is that far off.

But the problem lies in the type of computer that executes it. Nature (Chemistry/Physics) is what executes the DNA code and our attempts to just simulate a small part of that computer (like: http://folding.stanford.edu/) show just how complex the machinery is that interprets the code.

The human genome is elegant in its reduced size. One need look no further than the immune system to see how one can encode complexity from small amounts of coding information. The point is that the coding potential is not linear with the number of base pairs due to alternative splicing of the information. Who knows how complex the transcriptome and proteome is from the 3 billion base pairs.
I would speculate that the human genome has lost an incredible amount of information because of our heterotrophic life style. Our genome no longer encodes information for essential amino acids that we obtain from our diet.

Our creationist programmer here fell to an age old trap: assuming that the complexity of a piece of code is independent of the instruction set of the machine it runs on. Taking Turing machines and other idealizations aside, it is totally meaningless to use the number of bytes used for machine language encoding without considering the characteristics of the computing platform itself. In short, in order for this guy to have any shred of credibility he would have to take into account the chemical machinery that allows the code be "run". As it is, just as an exposition in elementary school maths, his opinion is misguided to the point of absurdity.

Just realized dak above had already posted my exact same idea. This is what happens to slow readers/even slower typers.
Well thought, dak ;D

The human genome is elegant in its reduced size.

But so is any other.

I would speculate that the human genome has lost an incredible amount of information because of our heterotrophic life style. Our genome no longer encodes information for essential amino acids that we obtain from our diet.

All vertebrates are incapable of making lysine, for example...

You guys should look into analog computers. They use laws of nature to build or measure things in ways that digital computers have long forgotten.

Seriously. He is both wrong and right. Wrong in thinking that its that **simple**, but not far off otherwise. Fact is, for something like word, probably 50MB of it is **functional** stuff needed to take a document, insert markers and codes, then print it, the other 1.45GB is all fracking GUI interfaces and fancy junk needed to make it *easy* to do that. Or, to put it another way, the genome can do with almost nothing what it takes a scientist and entire bloody lab to replicate, even if all they are trying to do is insert a few sequences. Its the interface that makes stuff understandable to humans that takes up all the room, not the functional code. Another example, the "CLI", command line version of POVRay is only 1.53MB and other than some recent design changes to move around a few features, add 1-2 things and remove a few other that didn't work, the *core* is still not much bigger than that, but the current version is 4.37MB, because of all the extra stuff needed to make it a GUI, so people can use easily. The GUI is "only" a text editor. Want to add in the *developmental* code that determines its default behaviors, then its 18.4MB. Want a GUI modeler, instead of typing it all in with text, then add on another 38.3MB, for a total of 56.7MB. Since this is just one program, and Office is like 4-5, multiply it by 5 and you still manage to come out ahead of MS Office, but then OpenOffice, which does everything MS Office does, is only *only* 440MB after its done installing.

Frankly, I can't fault to guy for trying. For being and idiot yes, but not trying. Most of the internet runs on *small* chunks of code, at least once you strip off all the GUI and human elements, they all interact in ways that are often very non-linear and asyncronous, and they manage to get everything from A to B quite well, despite none of them talking to each other, beyond the passing of discreet bits of information they each have to deal with as it arrives. Take all the code from every place in just the first 10 servers I can reach from my house, then try, without also duplicating the architecture, and try to figure out just how all of its works together. Well, you might be able to, but then we know what 100% of all those programs do, are meant to do, should do and will do, given the right conditions. We can't say that for the complex set of semi-independent agents in a cell. That we might one day be able to still doesn't mean jack, because we already run, in some environments, code written by mutation, which works because we killed off all the ones that didn't, not because we a) know 100% exactly how it does work, or b) specifically *designed* it to do that job. Unless you want to claim that, "forcing something to evolve into X, instead of Y", is the same thing as design. Guiding maybe, but not design, and that doesn't help their case at all, since they still can't prove guidance, intent, or anything else needed to suggest something is doing that to biological life forms.

But no PZ, I don't think he is off base in the analogy, just idiotically off base in damn near every other assumption and projection he makes from it, starting with the stupid assumption that you can reverse engineer the operation of anything, without the context of what it ran on in the first place, and when you don't even know *if* it uses analogs to the instruction sets your are familiar with. And we are talking about binary instruction sets here, not human readable, "if x then do y", type code. His mistake isn't the equivalent of knowing about boats and talking with undue authority on cars, but more like knowing a lot about sail boats, and presuming that means he knows how jet aircraft work, because, "Its just two sails tacked parallel to the ground, tacked onto basically a submarine." Sort of, in a insanely vague and stupid way, but yah ain't going get very far building a jet that way, even if you got *part* of it sort of right. ;)

Back when I was writing software, I don't recall writing small segments of self-modifying, recursive code and sprinkling them throughout my program...or perhaps more accurately, writing software that was one-third auto-loading noise maker and sprinkling a few words of functional code among them.

well I guess you just didn't have enough memory and cpu time available then ^^

I think there is some confusion over the definition of "code". If we think in terms of formal languages, DNA has not to my knowledge been proven rigorously to be such. But there could be less formal definitions that Dawkins may be thinking of, perhaps even metaphorical ones. Nonetheless, PZ is certainly correct to state that cell machinery bears little resemblence to computer code in the sense that the author he castigates is using it. The machinical capabilities of the cell haven't even been shown to be computationally universal, to my knowledge.

I'm happy to think of the genome as a program. At the high estimate, 100 million bits (25000 genes * 1000 bps per gene * 1000 bps of regulatory seq per gene * 2 bits per bp) or 12Mb in the human genome.

So a fairly small amount of code is enough to generate a person. The small number is clearly enough--it is what humans develop with. The program isn't written in a bloated computer language. It's more like hand-tuned (or genetic algorithm-tuned, ha) assembly code, full of GOTO statements and with enough cross connected subroutines to make the block diagram look knotted as a ball of thread.

The size doesn't seem small in relation to the code. Look at the Mandelbrot Set, 7 bytes to write it down and an incredibly complex result. So clearly a small program can produce a complex result.

PZ considers epigenesis important and a reason to reject the computer program hypothesis as insufficient. Epigenesis is clearly important but I don't see it as a reason to reject the computer program analogy. The epigenetic information is an expression of the genomic program.

Also, calling the genome a library of components seems too static to me. "Library of subroutines" or "library of services" captures the sense of what is going on better, with different subsets of routines active at any time.

It is interesting to compare computer programs with living organisms, but as this creationist shows it is easy to be mislead (or to mislead) by the analogy. I think any complete description of cellular activity and development will use the concepts used to describe computer programs.

I'm gonna go out and write a paper on how I think "Jesus" is an anagram for "cheese crackers".

What a friend we have in Cheeses!

Okay! I'm so excited by this thread! This is what I've been yapping about for months here, and it finally comes up.
The word "evolution" is used as a metaphor for all types of things: computer program developement, technological progress and product improvement are three.
But that does not by any means imply that the methods of technological "evolution" are the methods of natural evolution!
Yet creationists always make that mistake! Two desktop computers do not mate and make a laptop computer!

But here's the beuaty part: The scientific method which allows this technological improvement is the same method which has ferreted out a few of the mysteries about our origin and the origin of the world around us. But then, all of a sudden, they refuse to accept it.
So I guess, scientific progress is only progress if it disproves science!

Man, are they idiots.

Complexity, information, and design.

Three words which launched a thousand blogs of argument, are used in a thousand poor analogies, and raise a thousand poor religious and scientific hypotheses.

Can we not insist on a new protocol which requires each of these words to be fully defined (e.g. in a footnote) before it is used in a formal document? Especially peer reviewed scientific papers - if only to set a good example for those pushing particular ideologies.

Good luck defining 'complexity' by the way.

"...assuming that the complexity of a piece of code is independent of the instruction set of the machine it runs on."

This wouldn't a mistake, it's one of the most remarkable results from the theory of Kolmogorov complexity: the descriptive complexity of any object is invariant up to a fixed additive constant for any computationally universal device.

@#5
I just checked with Office 2003: WINWORD.EXE has close to 12mb.

This is ridiculous of course, since it is just the main executable, completely leaving aside necessary Office DLL's, operating system DLL's and whatever is needed to run Word.

Seems ignorance reaches beyond genomics and well into IT, as well.

#22 - I have baked a cake. And I've never put all the ingredients together and had it turn into a cat or a dog or anything but a cake.

Did you forget to wait 3 billion years?

You can compare the genome as a data set to the data stored by a computer, and that's legit -- we know how much storage space you need to pack away the sequence. It is not in any way comparable to software.

You could compare it to "compressed data", where the process of ontogeny corresponds to software decompression - a small genome could result in a very complex organism. But even that analogy will go only so far, since, as you pointed out, there is a great deal of complexity in pre-existing states and in the environment, as well as other factors to consider.

The whole nurture aspect itself destroys Stimpson's simplistic nonsense. Without proper stimulation, the brain (closest thing we have to a computer) basically withers into obsolescence, with the human unable to see and to speak.

Indeed, the whole developmental process is much more like evolution than like any kind of software telling the hardware what to do. It isn't accidental, either, that development is like evolution, messy, stochastic, with summation "replacing" exactness--because development comes from evolution, and only relatively flexible and changeable developmental processes have survived the evolution of the brain.

As typical, the creationist dolt looks at the closest designed analogy to human development and mistakes the exactness and efficiency of the designed process (even though software development isn't the most exact or efficient engineering solution, it's far closer to that than any developmental process is) for the vastly different evolved process. With any honest comparison, the mind reels at how really different the self-organization of development and evolution is from the rationally designed machines and processes made by engineers.

The trouble is that these idiots would have to get out and learn biology if they were going to understand it, rather than to mistake it for what they're doing. And unfortunately, evolution came nowhere near close to ensuring intelligent and honest responses from humans. Thus, Stimpson.

Glen D
http://tinyurl.com/2kxyc7

More from Randy Simpleton:

It only takes 10 pennies to disprove the 'laughable' theory of evolution. [1]

He also believes in an 'intelligent designer' who on occasion makes a mistake (3 arms instead of two... oops!). [2]

He's not a Christian... but he believes god exists because the alternative (evolution) is too far fetched. [3]

[1] http://randystimpson.blogspot.com/2006/05/10-penny-experiment.html

[2] http://randystimpson.blogspot.com/2006/05/boy-with-three-arms.html

[3] http://randystimpson.blogspot.com/2007/03/am-i-christian.html

. And being gobsmacked by nature's complexity doesn't make one either an idiot or a closet theist.
Mr Effect

Nobody minds if they're gobsmacked. It's when they're Godsmacked that it gets intolerable.

If this guy was worth his salt as a software developer, he'd knew how much 750 fricking MB are. And what you can do with procedual programming (which would be a far better analogy for DNA, methinks. Then again, I'm just a code myself, no biologist :)). Just look at the demoscene, for example farbrausch's Debris ( http://pouet.net/prod.php?which=30244 ), which is only 180KB (yes K, not M) big.

Oh, and on another note. It should be considered that all "genetic algorithms" we currently use to create artificial systems differ from biological ones one critical fashion. We design the RNA, if you will, that "runs" the code in its DNA. In other words, only one half of the system is undergoing any mutations. As far as I know, no one has even tried to replicate the true complexity of a system that has interdependent coding, where the machinery that interprets the instruction set can mutate side by side with the instructions. That is both one reason why any attempt to make analogies about how to understand DNA/RNA are gibberish, as well as probably one key reason why its more compact than anything we come up with. Evolved artificial code is already showing the ability to find unique, often humanly incomprehensible, solutions that take "less" code than we would design, but half that system is *still* human written, and subject to our inefficient solutions. If you could come up with a very small and simple environment, with very little overhead, then allow "both" the interpreter and the code to evolve in it, the resulting "cells" would probably end up even smaller than what we see now, with only half that get "run" evolving.

But the problem of how to make such an environment and get it to do what you intend... is just not something I can quite wrap my mind around in terms of how to even start.

You guys don't take prisoners, do you?

When you are dealing with people who "peer review" by burning at the stake, or breaking on the wheel, you can't be too careful.

I think you may have thrown out the baby with the bath water here. There is actually a powerful anti-creationist argument in here.

To start with, the genome is like a library of computer code. There is information about each genetic and when it will be transcribed. The style of programming is obscure, but there are software systems, for example, certain expert systems and real time systems, that use this a trigger and execute style of programming. On the other hand, it is a style of programming that no one in their right mind would ever use in a large project, but it is none the less exactly the style of programming one would use for an evolving system in which there is no particular goal or purpose save for survival and propagation.

In other words, it is a watch that clearly was not designed by a watchmaker. From the artifact we can deduce the lack of a creator.

Oops, change that "each genetic" to "each genetic unit". I'm low on units today.

I'm a computer programmer, and I find many aspects of biology to be positively fascinating: massively parallel brains, all sorts of hormonal and neural feedback systems, and convergent evolution all have analogues in the software development world. Heck, there are even bad analogues. The way selenocysteine is shoehorned into the genetic code is a hack that should make any good programmer cringe.

So knowing that engineers should know about and appreciate all these phenomena, I'm disappointed and, frankly, embarrassed that so many of us take on creationist worldviews and libertarian-conservative politics. We deal with logic all day; we should know better!

To follow-up on my post #75 and what others have said, I would say that if life appears to designed for anything at all, it was "designed" to evolve through mutation and natural selection.

From sexual reproduction in eukaryotes, to the interdependency of various functions and controls, the "design" exists so that it can change with changing environments and circumstances. So I guess the only clear "purpose" of the "designer" is that life will evolve without the intervention of agency.

Glen D
http://tinyurl.com/2kxyc7

Kaleberg: perhaps I missed what you consider perhaps an "anti-creationist" theme to the article... You may even be right. However, I think a lot of people are merely railing against the threat of misinformation (of any kind) that radiates off his "guestimations" and "I'm no expert but I'm going to write about this topic anyway"... in my humble opinion, I simply feel it's dangerous to say things without fact to back them up, and so it scares me when I read something that other people are going to read and likely take at face value. :( He doesn't sound so sure. So why is he writing these articles? The laymen don't care one way or another.

It's why I have to keep explaining to my Dad about the fossil record, suboptimal traits, etc... dispelling myths caused by misinformation somewhere down the line.

Sastra (#22)
Next time, make two cakes.
Put sugar in one and salt in the other.
Bake them and cut them.
Leave them on the table for the kids.
Say nothing.
Observe which one gets eaten.
Now you have simulated Natural Selection.

Guy's obviously a fruitloop - Word is 12MB? So he can't even count. But can we please go knock on his door and say "Well, I'm no programmer but I once read an article in Scientific American and I'm pretty sure you spelled elsif wrong. Here, I'll fix it for you." tap tap tap "Oh and all those curly braces are untidy. let's fix those too"

Computing is applicable to biology--but not by directly comparing what we commonly consider a "computer" to be with how cells work. To approach a computing system that behaves in a similar way to the way biology behaves, you would need a computer with very different properties like:

- massive parallelism (instead of centralized sequential step-by-step execution). Every atom is similar to a little processor of its own, executing the laws of physics. And you would need a staggering number of them to simulate the biology even in a single cell.

- no artificial separation of memory and processor (CPU). An atom both has state and activity, self-contained.

- three dimensional "memory" space rather than single dimensional like a typical computer (every location would need at least 6 immediate neighbors, and probably many more since space is finer grained than the size of atoms--and some physical forces have an effect that extends beyond the immediate neighborhood of a particular location). In a sense, "memory" locations are tied together increasingly weakly with distance from each other in 3D space for certain operations (e.g. gravity, electric charge).

- the equivalent of "bits" (atomic state) are fuzzy since atoms jiggle. They don't have a unique digital value, rather, a probability function.

Build a "computer" that behaves like this, follows the same rules as our physical universe, and you'll have a substrate for simulating a biological system. There are computers that simulate physics by modeling such behavior (slowly) on a small scale, but our Von Neumann architecture computers themselves don't behave that way, and arn't very powerful by comparison.

I've done my best to trudge through both Pharyngula's post and the comments but there's rather a lot to get through so please forgive me if I've missed something.

There are two very simple points that seem to have been missed.

Firstly: MSWord is code. DNA is data.

Secondly: data contains no information.

Code describes (with an arbitrary degree of redundancy) a context (a method or process). Data describes (with a different, arbitrary degree of redundancy), *nothing*.

Information is the combination of data and context. The context is what is supplied by the code (or, more rigorously, the specification).

For example, what is 3.14?

For those who think, 'approximately pi!' you're may or may not be right, there isn't enough information to make a call. 3.14 is my bank account's PIN (encrypted, of course) but it's also an approximation of pi. Data is meaningless as it lacks context. With context, it's information.

So, the *compressed* (he's disregarded all redundancy already) *information* contained in the human genome would fit into 27Mb, less than Microsoft Word's *uncompressed* *code*.

Head 'splodes with multiple category errors!

*sigh*

To give a concrete example of this error: the code required to analyse *a* chess game is many orders of magnitude smaller than the data required to describe *every* chess game, therefore, god.

Finally, it's not edifying to see Mr. Stimpson's argument from credulity to be countered by Mr. Pharyngula's argument from authority.

P.S. as you may have guessed, CS is my thing. I have some Estruscan Armoured Fisting Greaves warming by the fire, should abasement be required.

So, hobes, you're saying that DNA has no context?

Or are you just blithering?

If anyone's trying to be an authority based on pedantry and overly narrow meaning, it's you. Learn how to think and write instead of showing what a dumbass you are.

Glen D
http://tinyurl.com/2kxyc7

but DNA is a program. Just because any sequence of symbols that causes any black box to do something dependent on the input sequence can be seen as a program.
However, anything the ID moron tries to conclude from it, is of course wrong.

If DNA is not analagous to a computer program, then how do you explain my personal Y2K bug?

Seriously, the morning after 31/12/1999 I awoke all woozy, with a headache, an upset stomach, and limited memories of the night before; all I could think about was eating greasy food, popping Tylenol, and where all those empties came from.

About the size of MS Word:

He might be talking about the MS Word VIEWER, which is 11,7MB:
http://www.microsoft.com/downloads/details.aspx?familyid=95e24c87-8732-…
But I think his point that even a simple thing, such as a word processor, takes at least 12MB, therefore the human genome should be infinitely much larger.. But since the analogy is faulty, it doesnt matter much anyway..

T_U_T, a "program" is usually considered to be an "imperative" or sequential list of commands. DNA is not like this.

Instead, it is more like "declarative" programming, like a simple HTML page, which is not a step-by-step list of commands (though it can be extended with script which is "imperative"--that isn't my focus here).

Different sections of a web page define what the shape of those sections will look like. Then there can be implicit interactions between displayed sections which emerge from the size and shape of other sections--and their interaction with the environment (e.g. the size of the web browser window).

These emergent properties are not directly programmed into the original declarative "code". Another similarity with HTML is that there are hierarchical relationships between these sections. However, importantly, unlike HTML, in DNA this hierarchy (protein A actives protein B) is much messier and implicit rather than being an explicit part of the structure of the DNA itself. There is no simple tree structure of hierarchy in DNA like HTML has.

#53 some biological dynamics like inheritance are digital. - touché, yes. For transmission genetics ... and the OP is conflating transmission and developmental genetics. I'm coming up empty for any other examples of digital biological processes (neuron firing is a bit more complex overall than 'all-or-none').

This Randy Stimpson blog reminds me of Perry Marshall's so called "Atheist's Riddle". It's the same old sh** straight from the recycling bin.

#96 Different sections of a web page define what the shape of those sections will look like. Then there can be implicit interactions between displayed sections which emerge from the size and shape of other sections--and their interaction with the environment (e.g. the size of the web browser window).

This is just the kind of nonsense religion that Darwinists ejaculate constantly. Just because one section is bigger, another evolves smaller. This is the reasoning that they use to posit that black boys have relatively small brains on account of their relatively long penises.

As a computer science student, I must apologize for this Stimpson moron. We're not all stupid.

PZ has pointed out, correctly, that the genome is not a computer program. That isn't entirely the problem with Stimpson's thinking, as computation is a useful way to model and understand phenomena. Stimpsons appears, however, to only think about computation in very narrow terms.

First, he thinks that the genome works like the kinds of programs he writes, which are, most likely, structured to suit human thinking. Not everybody does that.

Second, he's probably the kind of programmer who writes huge, bloated programs. If he thinks that 750MB is not a lot of code, he has obviously never heard of the demoscene. Demoscene coders do amazing things in less than a kilobyte of code. Give them a whole 96 kilobytes, and they can make an entire 3d first-person shooter. Give them hundreds of megabytes, and they'd probably make a procedurally-generated simulation of planet Earth.

His third problem is that he is unaware that the validity of the evolutionary toolkit has been proven many times in his own area of expertise. My favorite example is Tierra, a simulation in which mutation and selection have produced better code than human beings. (That's not the best write-up on Tierra, but it was the best I could find at the moment.)

As with most creationist idiots, Stimpson is willfully ignorant.

My mileage rose dramatically when I took your advice last year to throw a few brine shrimp in the gas tank of the car.

You were joking, then?

#99, you completely miss the point. The point is that DNA is not like a typical computer program. It has different properties and those properties are important to its behavior. HTML is still an analogy, but it is a better one than a computer program.

For example, robustness under change: you can duplicate or remove sections, words, characters etc. of an HTML page and still end up, in many cases, with a valid page. Depending on the level of hierarchy at which the change occurs, you might end up with a page that looks radically different, or you might end up with just an altered spelling, fontsize or color change on some displayed word. Unlike an imperative list of commands which is quite fragile, there are relatively fewer spots in an HTML page that would cause large areas of the page to fail catastrophically.

Most of these comments are a bit over the top.

Firstly, despite various scathing comments from people who can't be bothered to check, it is a fact that C:\Program Files\Microsoft Office\Office11\Winword.exe has a size of 11.7MB. Moreover, it relies on many other dll's, so in some sense the size is much larger, and of the same order of magnitude as the coding part of the human or mouse genomes. Of course it is hard to compare what a mouse does with what MS Word does, but certainly it is a routine task to write word processing software, whereas no one has succeeded in writing software that matches a mouse in its ability to process visual information, navigate across rough terrain, interact socially and so on. There is an overwhelming mismatch in apparent power between computer software and biological organisms, especially when you remember that organisms have to assemble themselves.

Of course, none of this is anything more than the very beginnings of the story, and none of it can count against the enormous weight of evidence for the basic truth of the theory of evolution, but nonetheless it is extremely striking and provocative and surely points to some very interesting things waiting to be discovered. To ignore this one would have to be defiantly incurious, or so obsessed with dissing creationists as to be blind to all other considerations.

Of course, if I say "it is interesting that system A is much more powerful than system B", you can respond by giving me a laundry list of ways in which system A is different from system B. But that's a pretty weak answer unless you can explain why these differences make A much more powerful. Especially when enormous numbers of highly paid and highly intelligent people are doing their best to improve system B, and still not managing to match system A.

the genome is more like a crudely organized archive of components

Yeah, and as already noted here I believe, a lot of the unfolding information is gained by interactions with the environment. You could say that this is what happens with a program in a computational environment, but here it is "the computer" that interacts (and grows btw).

Stimpson also makes this mistake in the reverse order, he forgets that his "software" also describes the equivalent to the hardware in his computer analogy. I.e. where the gates are and how they connect, in order to execute the very code that produced them.

He seems to have pulled it out of his butt.

Actually he makes an estimate based on a bacteria, and that they don't contain much non-functional code. But it is a mote point anyway, as described in the post. Larry Moran on Sandwalk currently runs a series of posts describing the already known non-functional DNA in the genome.

If you're going to mock the guy for equating DNA with software then you're going to have to call Dawkins on it as well.

I haven't read The Blind Watchmaker, but your two quotes are contradictory. In the younger text Dawkins explains how the genome carries information "in a very computer-like way", in compliance with the view of PZ's post and irregardless of any algorithmic instances. And we can't very well criticize a person for changing his or hers mind.

A major reasoning problem among software developers, (and I make my living as a software developer,) consists of usually considering data sets and instructions to be necessarily separate entities. In nature, however, instructions and data are often intertwined in ways inconveniently difficult to understand. See Seth Lloyd's Programming the Universe to gain some insight into how such amalgamation can occur. Lloyd writes about information theory and quantum computing, but the concepts he discusses are much more applicable to biological, (and other natural,) processes than the human-designed digital computing approaches with which we are comfortably familiar.

"But I think his point that even a simple thing, such as a word processor, takes at least 12MB...."

That may be true of current word processors, but there were perfectly good word processors that ran on 8088 machines with 640KB of RAM max. And VisiCalc, the first spreadsheet program, ran on 48KB 6502s. So his whole comparison with Microsoft Word is silly.

Just a minor quibble re (thwaite #41): "an architect's or engineer's blueprint, which specifies every detail of the final product"

An architect's drawing relies on associated "libraries" of information - in the 1920s it could be a simple description, relying on the builder's craft skills to do things right, by the 1980s in UK practice the drawing would refer to a detailed bill of quantities and specification which in turn would refer to standards. Everything in that drawing would be referred to a huge amount of outside documentation.

How difficult it would be to model the evolutionary process using a computer program?

To model aspects, not at all. See for example Dawkins WEASEL programs. To make realistic models encompassing the already known theory, very. See for example the ev program to get a feel for the inherent complexity.

If it is so difficult for us to write a program that models random mutation and natural selection then how could the earliest forms of life do this just by chance?

Why do you think evolution (as in your simplified version of "random mutation and natural selection") applies to abiogenesis (at least, that is how I interpret "earliest forms of life")? And if you accept the determinism in natural selection, why do you think abiogenesis happened "by chance"?

While abiogenesis obviously differs from evolution in nearly all its aspects, it is feasible and perhaps reasonable to think that selection, in the form of chemical selection at the start, is one driving force. The route from selection on non-biologically produced systems to biologically reproduced ones is another question where we don't seem to know much.

But since then is our ignorance an argument for or against natural processes?

Firstly, despite various scathing comments from people who can't be bothered to check, it is a fact that C:\Program Files\Microsoft Office\Office11\Winword.exe has a size of 11.7MB.

This was already covered in #74.

...no one has succeeded in writing software that matches a mouse in its ability to process visual information, navigate across rough terrain, interact socially and so on.

However, various aspects of other organisms like the nematode Caenorhabditis have been successfully modeled. Baby steps.

Of course, if I say "it is interesting that system A is much more powerful than system B", you can respond by giving me a laundry list of ways in which system A is different from system B.

That is not what Stimpson said, though. He said if A is more 'powerful' than B, A must have more information hidden somewhere, ignoring the differences between A and B. Please pay attention.

Neil says:

> Especially when enormous numbers of highly paid and highly intelligent people are doing their best to improve system B, and still not managing to match system A.

You fail to appreciate the scale of what our computers are presently capable of vs. what cells have available to them in terms of processing power. The processing power required to precisely simulate the exact atomic behavior of a teaspoon of water is many orders of magnitude greater than every computer available on earth combined.

As we are nowhere near matching that level of computing power yet, your bar is set impossibly, unreasonably high. Take a look at the 1000s of computers Koza is using to evolve human competitive designs: http://www.genetic-programming.org/ http://www.genetic-programming.com/humancompetitive.html The automatically designed results we are currently capable of with today's processing power are on the order of complexity of a single protein. Not a cell, not an organism.

In spite of our primitive computers, genetic algorithms routinely design useful, complex, patentable, irreducibly complex systems that were not designed by human beings--automatically. The more processing power we throw at such problems, the more complex the designs become.

What cells are capable of may look magical--but then what today's computers are capable of would look magical to someone from a couple hundred years ago. What computers will be capable of in 50 years would look magical to us today.

He has his own view on why the entropy argument is valid. Which is depressing.

Ouch! Yes, depressing.

Let's see:
1. Stimpson doesn't know that the Sun produces entropy. In fact any energy producing process will, as heat cannot be continously all converted into mechanical energy (Carnot).

2. Stimpson doesn't know that the Earth increases in entropy. It eventually has to, as the observable universe it dumps its entropy into increases in entropy over time.

3. So Stimpson misunderstands the meaning of the Sun argument. It points out that Earth is an open system, and that entropy which can be held lower in a part of the system can continue to be lower in such cases. Compare with a fridge that can dump heat and so entropy into a ventilated space.

4. Stimpson confuses thermodynamic entropy with information entropy. He links a source that explains the difference.

I have noticed this tendency in engineers to be blithely reductionist about topics they do not understand.

That said, I think the genome-program analogy is somewhat valid, but the analogy is not nearly as strong as Stimpson supposes. The genome is like a program in that it contains specific responses to specified imputs. I don't really know if you can legitimately make it more specific than that.

A huge source of human complexity that hasn't been discussed (and I doubt that Stimpson even suspects its existence) is actually at the proteome level rather than at the genome level. Let's not forget what the genome's primary purpose is on a basic level: it contains instructions for making many proteins with highly specific shapes. The great economy and superficial simplicity of the genome is largely made possible because much of the complexity is at the protein folding level, not at the genomic level.

Much of the information that Stimpson claims does not fit in the genome is not explicitly coded in the genome at all; rather, it is implicit in the biochemical effects that occur when a specific sequence of amino acids is solubilized in water.

You'd think a computer engineer would be more familiar with recursion.

During my failed education in computer engineering, I wrote three pages of code to solve the Magic Square. The answer was like one line. I did not understand recursion, which was the point of the exercise.

Sort of reminds me of a client who knew a bit about digital photography who was upset with the professional photographer we had hired because he wanted to alter the shade of blue in one part of a photo.

It wasn't a digital photo, it was a traditional photo being blown up for posters. The photographer could "push" the blue when developing it, but that made other things come out purplish or something like that.

The client was adamant that our photographer didn't know what he was doing, because the client had used Macpaint or something.

He understood a digital process and was trying to force his understanding of that onto a chemical process. It just doesn't work.

Obviously, before I start defending a computational view of DNA from my position as a computer programmer who's only really read the pop science stuff, I'd like to say, in no uncertain terms that The Crank in question is definitely a crank with a very narrow view of what computation (and information for that matter) is.

However, I think PZ's got a narrow view of what computation is too. There's a saying that "data is just dumb code, and code is just smart data", and DNA is emphatically smart data, far smarter than any code a human programmer is likely to write any time soon. Generally, us programmer types tend to steer clear of writing self-modifying code, if only because it's a maintenance nighmare - it's astonishing how quickly that sort of code can get too complicated to wrap your head around. The demoscene types might be able to do amazing things in 96k, but I doubt they'd be even 10 times more amazing in 960K (and, if you think about it, a multiplier of 10 is a _very_ conservative estimate of how much more amazing such a program could/should be) because the complexity explodes exponentially. As for writing code that modifies the machine that the modified code will then run on...

There's a famous paper called 'Reflections on Trusting Trust' in which Ken Thompson describes a hack which involved rewriting the c compiler to detect that it was compiling the 'login' program and insert a backdoor login which had superuser rights. They then altered the compiler source code to detect that it was compiling the the compiler and insert the compiler alterations even if they weren't in the source. Once they'd compiled the compiler with the altered source code, they then excised that code from the compiler sources and compiled the compiler again and were left with a binary that had all the backdoor code in place, but no way of detecting it in the source.

Whenever people complain that the human genome is 'too short' to encode something as complex as a human being, I think of this story. A strand of human DNA doesn't describe an algorithm for building a human because it doesn't need to, a huge part of that algorithm is already embodied (literally) in human in whose uterus the new human is developing.

Dawkins wrote The Blind Watchmaker what, 25 years ago? He was speaking in a very general way. We know more now about how DNA acts.

architect's drawing... could be a simple description, relying on the builder's craft skills... (#107)
Interesting - so, the popular imagination of what a blueprint provides is as inadequate as is the popular press about what DNA provides (and in the same way). So real blueprints differ from recipes in degree of specified detail, not in kind?
I still expect, for example, that the corpus collosum (the band of neural tissue which is the main connection of the two brain hemispheres) results from neural ontogenesis, not from any explicit specification as such within the genome. Ditto for chins from bone ontogenesis (apparently the byproduct of two intersecting growth fields in the jaw), etc. And naive engineers are mostly surprised by that.

Hobes:

Firstly: MSWord is code. DNA is data.
Secondly: data contains no information.

You think you know about computing? Look up how von Neumann changed computing.

Code is data. It is interpreted by other programs or programs expressed in silicon. Code can write data that is in turn treated as code.

There's nothing magic about the bits in memory. Can you tell whether some series of 1's and zero's is data or code?

I weep.

It might be a good time to bring up the point again, since most seem to be ignoring it: a fundamental difference between formal computational models (of which most programming languages are one) and cellular machinery is that extant cellular machinery has not been shown to be computationally universal. Individual molecular components of the cell have been used to construct universal devices, but the cell as it is doesn't have such an attribute AFAIK.

Nor am I aware of a proof that the base pairs of nucleotides constitute a formal language. There may be a less concrete notion of "code" where such a thing qualifies, but it would probably be stripped of any CS theoretic significance.

As a computer scientist myself and background on evolutionary algorithms...

You could make some basic analogs between DNA and computer data. It is fairly possible to write out the nucleotides, they are however letters and have no actual functionality. This is about where the comparisons end. You can declare them both roughly "data". The processing however is more like a self-assembling recipe dancing with the natural world. The brain is made in this way and yet your brain isn't stored in DNA. The structure isn't either just the basic recipe of how you would make the entire individual from scratch. It's a fairly impressive set of points and I wish PZ would harp on it more.

Stimson, however, is wrong at every turn. Not just about biology but about computer science. Also, he doesn't know a damned thing about complexity. I've manage to evolve a six line program (four bytes a line) which defeated me repeatedly at Texas Holdem. Managing to raise anything with tens or higher and pocket pairs as low as 6, and pocket 2s. For other solutions to other problems figuring out how they worked took a good amount of know how and on more than one occasion calculus.

I'm personally surprised that it would take 24 megs to store all the non-junk DNA (you could use the fact that some sequences code for the same thing to compress the data more). That seems a bit much.

You must consider that there's no organization to any of this. No layer after layer of encapsulation, no nothing. It would be like writing thousands of fragments of functional programming code in assembly to constantly execute on an analog computer and having it all work out pretty well.

There's a reason why evolution codes far more tightly than we ever could, it's a hell of a lot smarter than us.

The minimal time on a comparison sorting algorithm is n*log(n), but correctly there are plenty of non-comparison sorting algorithms such as radix or spaghetti or hash... the bigger question is WHY THE HELL IS HE SORTING! Why is considering DNA program and them sorting it like data and them applying TIME complexity to balloon the size of the DATA itself?

He's not just stupid in biology, he's exhibiting fractal wrongness. Sure, his entire world view is stupid. But, if you zoom in on just this argument you find it is just as wrong as the entire world view. If you zoom in on just this statement it's just as wrong as the entire worldview. He's wrong at every possible resolution.

The biggest failing of genetic algorithms is that those who program them are largely programmers, and for some reason the data/code needs to be vaguely coherent (typically the data is the code which kind of screws up the power of the phenotype boundary).

What bugs me isn't the thought. I'm in favour of thoughts and questions - even if they're not correct.

It's that if the person thinking about something can't imagine doing it themselves, or figure out how it works, then it must be impossible or designed by a 'god'.

To me, that's just arrogance.

> cellular machinery has not been shown to be computationally universal.

Unless being a universal Turing machine had selective value, I don't see why we should expect that to have arisen.
DNA "code" would only require the expressive power to grow a diversity of protein-based bodies, and to extend the ability to evolve such bodies as well.

Similarly, when Koza evolves electronic circuits using Genetic Algorithms, the "genome" being expressed only has the power to assemble electronic circuit variants, for example: http://www.genetic-programming.com/hc/cubic.html There is no need for the code to be more expressive than the phenotype being built, and it would be wasteful to have more expressive power than that.

There is a strong practical reason for limiting the expressiveness of the "code" in a GA: by restricting the search space there are relatively fewer variants that need to be searched (though the remaining space is still ). There is a tradeoff: too small a search space and you limit the "creativity" of possible solutions. Too large a search space and you may never find anything useful. We should expect the expressive power of DNA to have been tuned by evolution to provide a an effective balance of variation vs limiting the search space to useful size.

"Unless being a universal Turing machine had selective value, I don't see why we should expect that to have arisen."

Part of my point is that it hasn't arisen. When we talk about DNA in terms of "code" and "computations", we're being imprecise in our use of terminology. It may be a strong metaphor in some circumstances, but it might tempt us to draw conclusions that are premature.

And even in terms of metaphors, one could make just as strong a case for blueprints, recipes, etc. as metaphors for DNA. All have their strengths and shortcomings.

However badly Stimpson has done it---and that's very badly indeed---I think that comparing the genome to a computer program is a good thing, if done well.

As a computer scientist with experience with some very bizarre computers and algorithms, I agree with Dawkins that the genome IS a (mostly) a computer program. What it's (mostly) doing is computation. No, it doesn't look a like a FORTRAN program on a PC, but that shouldn't be a surprise.

Computer programs can be/use bottom-up algorithms with regularities that emerge from bottom-up activity, with no top-down control structure. They can be supermassively parallel, depending on there being more computing machinery than data---in which case the usual big O analysis doesn't even apply. (E.g., sorting in log n time, but not 0(log n), because O(log n) assumes you have lots more data than processors.) Computer programs can be nondeterministic, reactive, self-analyzing and self-modifying, and can work mostly bottom-up. They can also rely on environmental constraints, with crucial relationships represented nowhere in the actual system---just in a designer's head, if you're lucky.

I'm not a biologist and am a computer scientist, but I know a lot more about genetics and development than most computer scientists, and the more I learn about genetics the more I think that the core of biology is a kind of computer science. Not the kind of computer science that most CS folks are intimately familiar with, but the kind I've been exposed to studying massive parallelism, reflective programming (roughly, disciplined forms of self-modification), etc. Programs may also be nondeterministic, deliver approximate results within a range of acceptable values, etc.

If you have some exposure to computer science and this makes no sense to you, consider a typical modern operating system. It's doing rudimentary versions of many of the things I've mentioned above when it asks if you want to install a driver for a piece of hardware. It's reacting with its environment (you), and modifying itself at a very low level (just above the hardware) on the basis of its interpretation of high- and low-level input. (User feedback and various compatibility checks for the driver.) It's also operating in pseudo- (or real) parallel---at the same time it's talking to you about modifying itself, it may be doing various other things. The overall control structure is event-driven (via device and timer interrupts) and emergent, not top-down. The code may also rely on "environmental constraints" (such as the maximum rate at which underlying hardware can deliver interrupts, or the maximum frequency that can be transduced by an input transducer).

As an operating systems and programming languages guy, when I look at Hox genes and whatnot, I don't see anything fundamentally unfamiliar. If that's not a program, I don't know what it is, and I certainly don't know how it's NOT a program.)

People often make a big deal about biological things being largely analog, but that's usually a red herring. There's usually no important difference between analog and limited-resolution digital, if you're talking about stuff that's designed to be approximate, noise-tolerant, and resilient. Digital stuff usually bottoms out as analog stuff at some level, and it's mostly a matter of convenience that we usually keep things strictly digital when we can, rather than letting the weirdness of analog creep up to higher levels and having to tolerate it.

So sure, genetics and development don't always look a lot like CS101, or even a whole undergraduate education in mainstream computer science. That doesn't mean that they're not fundamentally computational or programmatic. It just means that there are practical reasons for most human-programmed programs on computers circa 2000 to look different from evolved programs on chemical computers. Evolution is free to use supermassive chemical parallelism, self-modification, etc., because it doesn't have to run it on a PC or document the code and explain it to a manager.

And even so, clearly programmatic stuff does often pop right out, and I don't understand people's resistance to recognizing its programmatic nature by talking about developmental "programs," etc. As long as you recognize that a program can be reactive, nondeterministic, etc., that shouldn't be a problem.

(Many of the things that people assume "computer programs don't do" are actually things that most operating systems do every day, in some rudimentary form. Many others are done somewhere, on some machine, every day.)

I'd also hope that given a proper understanding of computer science, a computer sciency approach could be helpful. (For example, when there is obvious programmatic stuff going on, one obvious question should be what synchronization mechanisms are used---and are they used pervasively, or is there a whole slew of different kludgey ones for different aspects of the program? Which programming constructs and implementation mechanisms has evolution systematically reused?)

I do see a problem that many people who hear talk about biological "programs" may jump to invalid conclusions based on the very limited kinds of programs they've seen. The solution to that (IMHO) is not to say that it's "not a program" but to clearly explain what kinds of program it is or isn't. There are more kinds of computer science than are dreamt of in most folks philosophies. (Including most computer scientists.)

Also:

"There is a strong practical reason for limiting the expressiveness of the "code" in a GA: by restricting the search space there are relatively fewer variants that need to be searched (though the remaining space is still )."

I don't find this argument very convincing. If anything greater expressive complexity (e.g., strong typing, context constraints, etc.) would be better at constricting search space than a free form alphabet, as DNA appears to be to a rough approximation.

> If anything greater expressive complexity (e.g., strong typing, context constraints, etc.) would be better at constricting search space than a free form alphabet, as DNA appears to be to a rough approximation.

Note that "context constraints" already exist via gene suppression etc. but you appear to be focused on the genotype "alphabet" as the search space and that is not my point. The specific "alphabet" variants are less important than the range of phenotype variations that alphabet is able to produce. (There can be different alphabets the produce the same range of phenotype variation).

Specific phenotype variants are what is being selected, not the variations in the DNA itself. It is the phenotype variants that make up the search space we care about, and the "alphabet"'s ability to explore useful variations with that phenotype space.

One could imagine, for example, a DNA code based on 4 base pairs that translated to many more amino acids than the typical 3 basepairs (with some 20 amino acids) that we actually see. If these additional speculative amino acids were useless to building bodies (of no functional value in the phenotype), then the code is less useful because it is producing a substantially larger search space of possible phenotype variants which must be tested before finding an improvement.

A larger search space with fewer useful variants means it will take longer to locate a useful variant. Such a search space is wasteful.

On the other hand, let's say we use only a 2 basepair code with only a couple of amino acids to work with in our phenotype. Then our search space of possible body variants is very small and we may not have enough variation to find creative solutions since such solutions may require amino acids that arn't being specified by our reduced code. We will search our smaller space of possible bodies quickly, but there won't be many options for selection to act on.

"Specific phenotype variants are what is being selected, not the variations in the DNA itself. It is the phenotype variants that make up the search space we care about, and the "alphabet"'s ability to explore useful variations with that phenotype space."

This works if we assume the length of the alphabet itself has non-trivial effects on its expressive capabilities. There is already a well established optimization theoretic result, via Fogel and Ghozeil 1997, that has shown that no representation of a bijective mapping offers unique capabilities. What I'm talking about is qualitative differences in syntactic structure, and my argument is that more explicit structure would help rather than hinder confines of the search space.

"One could imagine, for example, a DNA code based on 4 base pairs that translated to many more amino acids than the typical 3 basepairs (with some 20 amino acids) that we actually see. If these additional speculative amino acids were useless to building bodies (of no functional value in the phenotype), then the code is less useful because it is producing a substantially larger search space of possible phenotype variants which must be tested before finding an improvement."

Considering that it is pretty well established that most of our genome is non-functional and contains large amounts of redundancy, it doesn't appear that the alphabet is doing all that great of a job. It's why I argue that perhaps thinking in terms of computation, at least from a formal and non-metaphorical perspective, invites premature conclusions.

Pharny,

I take offense to this post. I graduated from the tenth best engineering university in the country (best in my state, I might add), and I consider myself a good computer engineer and even better computer programmer. The individual to which you refer is quite obviously an incompetent retard, and I am upset that you portrayed him in such a way that even implied that he had the slightest clue as to what he was talking about. My training in the inner workings of computers and software systems tells me that he quite obviously does not. While he, hypothetically, may be aware of the definition of the word "computer", I seriously doubt his commitment to the integrity of the field.

Therefore, I respectfully request that you respect my field of expertise by restraining yourself from referring to those ignoramuses, who claim to be associated with it but are clearly not, who attempt to abuse the ingenious gains in the inroads of software systems to promote the ideals of dark-ages bullshit, as experts. Please, do not insult my, or my friends' and coworkers', intelligence further.

Thank you.

degree--; // This line of code updates the status of a computer scientist with no understanding of complexity theory or encoding.

> ...has shown that no representation of a bijective mapping offers unique capabilities.

You'll have to explain what is being mapped to what that you are saying is one-to-one. DNA to phenotype is not one-to-one, for example, because multiple codons code for the same amino acids. http://en.wikipedia.org/wiki/Genetic_code

>my argument is that more explicit structure would help rather than hinder confines of the search space.

If by "more explicit structure" you mean adding more unique states in the genotype, then to the degree that this generates more variation in the phenotype, the search space has increased in size. However, the genotype can increase in states without creating additional phenotype variants.

If by "more explicit structure" you mean more constraints on what unique states are allowed in the genotype, then sure, that would also tend to reduce the range of variation in the phenotype.

> Considering that it is pretty well established that most of our genome is non-functional and contains large amounts of redundancy, it doesn't appear that the alphabet is doing all that great of a job.

Why is redundancy or non-function a sign of a doing a poor job? Redundancy (gene duplication) is important to the formation of new gene variants because it allows new variants to be tried without breaking a critical existing gene.

Non-functional genes provide raw material for future variants. Rather than being a sign of a poor job, these things are both useful and an expected side-effect of the process. Genetic Algorithms produce similar constructs (duplicates and broken structures).

Note that if such DNA is truly non-functional, then it isn't affecting the phenotype, so it would be indistinguishable from a phenotype generated from DNA without such a sequence, other than say, the metabolic load of having to maintain and replicate the additional information.

> It's why I argue that perhaps thinking in terms of computation, at least from a formal and non-metaphorical perspective, invites premature conclusions.

To me this feels like saying, "from my experience with long division and multiplication tables, use of mathematics for signal processing invites premature conclusions". Umm. Only if you restrict yourself to long division rather than bringing in Fourier.

I think Paul W has it quite right. What constitutes "computation" is much much broader than the narrow special-case example of a Von Neumann PC that most people are familiar with... just as mathematics is much broader than long division.

Since this is my field, I should probably say something substantial. There are two very simple counter-arguments:

a) Writing the word "human" encodes all of the same variability in one word. If I tell you a single bit of value 1 represents "human", I can represent "750MB" with a single bit. The amount of symbols required to encode anything depends on how it is interpreted. Unless you define the system with mathematical rigor (i.e. actually supply context,) you're just babbling incoherently every time you use the words "complexity" or "information" or "bit". This is a universal idea at the core of complexity theory, information theory, and formal languages. Someone with a degree in computer science should know this (unless they slept through the courses actually relevant to this topic.)

b) This is just a pure argument from incredulity and ignorance. "It seems impossible to me => God. QED"

?Stimpson hasn't said anything quite that stupid

Oh yes he has.

He's just claimed that Gawd has to make a miraculous intervention every time anything reproduces. A bigger intervention than loaves and fishes, or parting bodies of water. Much bigger. For every single birth (at least among the metazoans.

And that is pretty awesomely stupid.

Now I'm no geneticist, and I haven't done any hardcore coding in years, but I can spot stupid a mile away, and this stupid glows in the dark.

A lot of good things have been said here in the comments, and I'm going to add a few things. For what it may or may not be worth, I've been fascinated with the overlaps between computation and biology for most of my life, and I've got a lot of relatively well-educated opinions... I'm just going to toss out some quick points, since this topic is a huge can of worms:

- this Stimpson guy is obviously comparing apples to oranges (or apples to orangutans) all over the place, so some of his premises and all of his conclusions are bogus
- PZ, in complaining that someone who has some rudimentary experience in biology is making bogus arguments about it, is taking his rudimentary experience in computer programming and making bogus arguments about that. This is really, really unfortunate.
- biological systems are not computers for the mainstream view of a computer as some microprocessor-based systems similar to von Neumann machines that you program in C or Java or some human-designed language
- many biological systems *are* computers in the sense that they are systems that operate on digital data in order to do something.
- one of the failures of the 60s/70s-era "artificial intelligence" efforts was that people became too attached to the Church-Turing thesis and its implications: that treating all computations as equivalent to universal Turing machines and equivalent systems was the best and only way to think about anything that does any sort of computation
- the philosophers got hold of the Church-Turing stuff and went all kooky even more, making assertions like "the brain is a Turing Machine and the tape is the sensory input and the brain states are the turing-machine states." These people annoy the crap out of me.
- some things that aren't useful to model as Turing machines include: quantum systems, brains and artificial neural networks, immune systems, and the "central dogma of biology"-- in particular, things that have some analog component to them
- mathematics and computer science and information theory and physics have some excellent tools to investigate things that aren't computers in the Church-Turing sense, and the fact that some dumbass creationists wave these around as part of their nonsense no more invalidates them than that they wave around DNA invalidates using DNA for biology
- useful DNA is a digital code. Declaring that it's not is stupid. In fact, the 3-base-pairs-per-amino-acid part of the DNA code is well-understood and is called a code by biologists. Whether LINES and such are code or not is sort of a meaningless question, in the same lines of asking whether a computer program that has errors in it is actually a code... it's a code for nonsense, just like encoding jibberish with a secret decoder ring gives you correctly-encoded jibberish. So what? Similarly, the distinction between program and data, while common in programming PCs, is frequently frowned upon by good computer scientists: a program is data, most data is a program. This is formalized by the lambda calculus, Godel numbers, the LISP programming language, and stuff of that sort. Making some argument that genetic material isn't like programming PCs to do image processing on microscope images is very similar to that stupid creationist arguing that human beings and oranges don't superficially look like each other.
- it's at least a useful analogy to say that DNA provides a program, and the zygote is a computer, and when you run the program on the computer with the right environment (which might include some external inputs) it constitutes most of the informational specification that produces the adult organism. Having the digital DNA component is necessary, but not sufficient, to produce the adult for sure. Most of the stuff that's not (nuclear and mitochondrial) DNA) is framework that doesn't carry much information, in the information theoretic sense.
- there are parts of developmental biology that are analog rather than digital. There are, however, also parts of the laptop in front of me that are analog rather than digital. Many analog systems in computers are specifically designed to mitigate the analog nature, such as things that take some voltage between the "on" and "off" state and slam it into whichever one it's closer to. There are lots of analogous things in developmental biology. Transcription rate is inherently pretty much analog. Gene Regulatory Networks, however, have a whole lot of components that are locked into being digital in various ways. Applying digital computer design to understand the digital parts of these systems is a very useful thing to do in many cases, but it's important to remember that some parts are analog.
- this isn't a big deal, if you have a computer control system for a car, or a rocket, or a thermostat, or a refrigerator, or a stereo, there are parts that are digital and parts that are analog, and both engineers and biologists have good ways to deal with these. There is a real problem in the programming world that a lot of arrogant computer programmers pretend that they know how to handle the quirks of analog or other weird systems when they don't... but that doesn't mean that it's impossible to, and particularly doesn't mean that it's appropriate to bash on anyone who tries to.
- many things people expect from computers may not apply to any given biological system, so making far-reaching naive claims like Stimpson does is often wrong.
- some particular aspects of developmental regulation that are different from computers: they are asynchronous, in that there is no system clock. They are, in many ways, massively parallel rather than sequential. They often store things in varied, disorganized ways, rather than in something analogous to "random access memory" devices. They are more like a lot of tiny special-purpose computers running in parallel than one big one.
- however there are ways that they are computers in the information-theoretic sense: they use a digital storage medium, one that's topologically rather like a bunch of tapes, to store information, and they have "tape heads" that run along the tapes and do things with them. The gene regulatory networks that control development and other cell physiology have a lot in common with computer design: there are boolean logic gates, latches, various analog-to-digital mechanisms to avoid cells being half in one and half in another state, mechanisms to control sequential developmental events by using the parallel analog system to implement a sequential digital system, and so forth.
- invoking Mandelbrot sets and so forth is a cop-out. A Mandelbrot set looks cool to tripping hippies, but in some sense it doesn't have any more information in it than the one-page computer program that creates it. In particular, a small "evolutionary" change to that program/genotype will not readily produce a meaningful small-step change in the picture/phenotype. Most changes in the genome do not produce dramatic phenotype changes... that sort of thing was selected out long ago, because anything that fragile would cause most mutations to be fatal.
- the information theoretic arguments that suggest that there is a complex, robust, interesting, deep, and not-yet-understood mapping from the genome to the developing and adult organism are pretty good. The point that it's interesting to investigate how the fairly small, in terms of bits of information, genome stores (most of) the information needed to produce a brain, for example, that we have no idea how to represent meaningfully in a similar sized bunch of data in a computer suggests that there is a lot about how it works that we don't really understand yet. The fact that stupid creationists make the same arguments about this that they do about lack of transitional fossils doesn't mean it should be dismissed as "not biology," just that we should continue to make fun of them or deride them for misapplying or misrepresenting complicated technical details in a naive, ignorant, or deliberately misleading way.
- many systems actually do try to model genotypes, phenotypes, and evolution using exactly computer models. Tom Ray's original Tierra project used actual PC machine language as its "genome." I thought this was rather naive, but it produced interesting proof-of-concept results. Avida tried to make a more relevant representation, but still a major simplification. Karl Sims used a LISP variant for his genomes. These have all produced interesting simplified lab experiments. The "DNA computers" and other chemical self-replicating systems, chemical universal turing machine implementations, and so forth are also an interesting simplification of this sort of things, and are examples of the blurry lines on the computational continuum between a laptop and an embryo.
- if biologists and mathematical computer scientists work together instead of bashing on each other, together they can do good things, because they can help to point out each other's blind spots and narrow thinking. If they prefer to accuse each other of being creationists, then they're wasting time.
- I think the comments above show that there are a lot of people who are very knowledgeable computer experts who are interested enough in the topics in this blog to read it, which, unlike the post, shows that there is a lot of opportunity for applying a synergy of biological and computational techniques to understand the complex systems that drive life, including evolution, development, neurobiology, the immune system, physiology, and doubtless many other things I can't think of. Appreciating the talents that people with different backgrounds bring to these discussions is necessary for the synergy, while bashing on them is really, really, counterproductive. Neurobiology has adopted mathematical modeling and electrical engineering techniques. Neuromorphic engineering has adopted neurobiology. Biophysics incorporates biology, physics, and engineering techniques. Vision researchers in computer vision and eye physiology have a healthy cross-talk, at least sometimes. Saying things like "you stupid computer scientists and your stupid math can't explain the magical complexities of the genetic wonders that we high-and-mighty biologists understand" is similarly close-minded to creationists saying "you rationalist biologists will never explain god's mysteries because you refuse to see the soul in the brain and the guiding hand of god in the design of organisms."

I've been thinking about this, and I think that the title of this entry is very wrong. The genome IS a computer program, but the inputs and outputs are so complex and subtle that it isn't a program in any sort of sense that you could compare to conventional programs. It's a program in the larger sense: it is stored information that responds to particular inputs with particular outputs.

PZ, I think your thinking about this is a little too close-minded and a little too biased towards the macrobiology point of view. Obviously, Stimpson is wrong in too many ways to count, but this is the first time in ~2 months of reading your blog that I think you aren't exactly on the money, either.

If the genome isn't a program, aren't you denying the entire premise of the field of synthetic biology?

Just a little constructive criticism from a chemist.

I'm also a computer programmer and one thing I've learnt is that the more complicated the code, the shitter the programmer.

So this argument is basically saying that life is too complicated to not have been designed by a really, really bad designer. Or a comittee.

Hi PZ,

Your blog is amazingly popular. Thanks to your critique my blog has actually received more hits in one day than in its entire lifetime. I am a little bit disappointed that not many people have read my favorites which are Don't Hurt Yourself, Politically Incorrect Statego Rules and Cockamamie Optimism. It's too bad that your blog wasn't making a critique of my business website instead.

I would like to point out that you have made some assumptions about me and what I think which are not true; however, I don't want to waste time elaborating on those. Instead I would like to drill down to the core of where we disagree. Your line of reasoning has led you to believe that:

There is nothing in your genome that says anything comparable to "make 5 fingers": cells tumble through coarsely predictable patterns of interactions during which that pattern emerges. "5-fingeredness" is not a program, it is not explicitly laid out anywhere in the genome, and it cannot be separated from the contingent chain of events involved in limb formation.

You also said that:

He thinks there needs to be some nuclear authority that specifies higher level activities like tissue repair and vision -- there isn't! There is no map. There is no boss. There is no blueprint. Vision is an emergent property of cells and proteins interacting in development ...

It's true that I believe that DNA contains information that specifies limb formation and higher level activities like tissue repair and vision. I am little suprised that you don't. I am wondering what percentages of biologists or people reading this blog would agree with you on those points.

Ashley Moore said:

So this argument is basically saying that life is too complicated to not have been designed by a really, really bad designer. Or a comittee.

Is it just me or do most arguments from design come back to that if examined with any degree of rigor? Surely a designer that is miraculous, omnipotent but grotesquely incompetent is not quite what they had in mind - unless they're suggesting Dubya as a potential designer?

@ #134 B McManus
I think you have hit the nail on the head.
Most people have too narrow a view of what a 'computer program' is. Even many computer science graduates seem to subscribe to the idea that it is simply a set of imperative statements when most programs and especially new and large programs running on modern operating systems are much more complicated than that. The environment in which the program runs is critical in much the same way that the environment in which DNA is 'run' is critical. Change part of the environment and the program will do something different, sometimes radically so. In interactive, networked, event driven, real time programs there is also no master controller in charge. There are many controllers, the programmers of which might entertain the delusion that their program is in charge, but that's a similar delusion to that of the creationists and is caused by a narrow and parochial view of programming.
@#22 Saying that you can't throw some DNA on the ground and grow a human is just as true of programs. You can't throw a program on the ground and connect electricity to it and grow a web server either. Both must be wrapped in exactly the correct container and provided with the correct nutrients, correct environment, run time libraries, and so on.
Disclosure: I'm a long time computer programmer (since 1969), physics graduate, electronics and electrical engineer and avid reader of Dawkins' books.

A genome is something like a program, but one that functions simultaneously in both analog and digital modes. And if that doesn't seem significant to you, consider...

Damn, I'm having trouble digging up the reference. But in one of those software evolution experiments, researchers applied many generations of selection to FPGAs to get ones recognizing a particular pattern. When they examined the winning programs, they seemed at first impossibly broken. There were far too few gates in the active part of the circuit, and many gates were in bizarre feedback-loop configurations unconnected to the inputs.

But the programs worked. Strangely enough, though, they didn't work if copied to another FPGA. Discounting the interference of noodly appendages, it appears that cross-talk between the gates enabled a more powerful program with fewer apparent bits. Albeit, one which depended not only on the digital program but on undocumented on-chip physical variations and interactions.

So, anybody unimpressed with the potential complexity of a three-billion-base genome (with many non-coding portions performing a regulatory function, and much other apparent junk filling a structural role) should consider the difficulty of programming a computer in which bytes 0xfec4d030-3b0 is capable of folding over next to bytes 0x00cdfb90-e10 and up- or down-regulating their signals into that no-man's land of voltages neither consistently 1 nor consistently 0.

I'd originally intended to write about how the great IBM (mainframe) programmer Harlan Mills often compared OS/360 JCL to a cow -- a system so horrifically complicated through a long history of kludges that treating it as *designed* could lead to insanity. Perhaps another day.

Nah, you guys who argue that a genome is a computer program aren't persuading me at all. You're missing the whole point, that the genome is incomplete and does not specify or regulate much of anything without major contributions from its environment. If there is a computation engine in the cell, it's the cytoplasm.

You're missing the whole point, that the genome is incomplete and does not specify or regulate much of anything without major contributions from its environment.

Neither does a computer program, though?

A computer program contains the operators that work on data from the environment. The genome does not. The genome is part of the data that the cytoplasm manipulates.

I got involved in an argument with a bizarre creationist (ID "cheebo") on the Amazon.com forums. He kept talking about how scientists compared DNA to writing and Bill Gates compared it to a computer program. We tried to explain that those were only analogies, but he didn't listen. He insisted that this proved DNA was a literal written text, put there by the hand of ALMIGHTY GAWD!

We quickly realized that he did not understand the concept of an analogy! He had no idea how to distinguish between reality and metaphor, analogy, or outright fiction. Much like the creationist who recently came here and seems to have mistaken Harry Turtledove's science fiction for actual history, this person was in serious need of remedial elementary English classes.

Now, I knew creationists tend toward stupid, but this level of utter idiocy shocked me. How do you survive in the real world without being able to tell the difference between fact and fantasy?

And perhaps this is the real reason so many christians were up in arms about Harry Potter and the DaVinci Code. They couldn't tell those books were fiction, because they didn't know the difference!

A computer program contains the operators that work on data from the environment.

Some of them, but its functioning rests on the operators provided by the hardware.

The genome does not.

Then what are "genetic circuits"?

A genetic circuit is an excellent example of what I'm talking about! They are utterly dependent on cell signaling and signal transduction for their function. In those diagrams, all those little lines connecting the modules are the important parts -- and that connectivity illustrates patterns of interaction between genes.

PZ wrote:

If there is a computation engine in the cell, it's the cytoplasm.

How about the ribosomes?

This article:
http://www.universityofcalifornia.edu/news/article/4119

says:

"The ribosome is like a computer-driven protein factory..."

PZ,

I don't understand exactly what code/data distinction you're making or why you think it's crucial.

Of course the cytoplasm is crucial computational machinery. How is that a problem?

In general, code only makes sense relative to an underlying machine that "interprets" or "executes" it, and much of the meaning of the program is implicit in the underlying machine implementation.

One thing that's funky is that a gene is both an "instruction" and (typically) a hardware switch to execute that instruction. (Saying something roughly like roughly, if A and B and C but not D or E, then do F.)

That's not actually particularly weird and is easily describable in computer science terms. The genome is mostly a "production system" (set of rules with preconditions for them to fire), rather like Emil Post's system that predates Turing machines, or like the rule-based inference engines often used in AI work (on top of von Neumann machines).

DNA only makes sense as a switch or instruction in a cytoplasmic environment that makes it act as one. But that's typical of production systems or any other kind of code---instructions only make sense as instructions in the context of machinery that they influence to "do" the instruction. (The instruction and the interpreting hardware, together, determine the action taken.) In a different context, they might be "data" or just garbage.

To me, the genome looks a lot like a dataflow program executing on a massively parallel dataflow machine; that's a machine where a program is represented as a dependency network, and program actions happen when their inputs are available. (The output(s) of one operation may trigger other computations that take those outputs as their inputs.) There's no synchronous clock or centralized controller, but that doesn't keep it from being a program executing in a computer.

The fact that most people have never seen a dataflow program executing on an asynchronous massively parallel dataflow machine doesn't mean it isn't exactly a program running in a computer. (If it did, some of my computer architect colleagues would be very surprised to find out that what they do "isn't computer science.")

the genome is incomplete and does not specify or regulate much of anything without major contributions from its environment.

any program is incomplete for the same reasons. it does not specify nor control anything without major contributions from its hardware

If there is a computation engine in the cell, it's the cytoplasm.

this is a complete cop-out. Nobody, not even the ID moron said that DNA is the computation engine ( aka processor ).
of course, the processing takes place somewhere in the hardware. in PC it happens to be the CPU and the GPU.
In the cell it happens to be the cytoplasm.

A computer program contains the operators that work on data from the environment. The genome does not. The genome is part of the data that the cytoplasm manipulates.

DUUH

any program is just data for the processor which tells it what to do next

phantomreader42 wrote:

We quickly realized that he did not understand the concept of an analogy! He had no idea how to distinguish between reality and metaphor, analogy, or outright fiction.

I suspect that metaphor/analogy blindness is typical of fundy Christians. The Bible blurs apparent metaphor with supposed factual statements in a very confusing way.

Maybe coming at DNA as a computer program is the wrong way of looking at it. The fact that one can represent a base as a 2-bit chunk of data does not mean that suddenly it is a program. Not all patterns of 1s and 0s on my computer are programs.

Taking developmental factors into account it seems that it would be better to view DNA as an input tape on a two-tape Turing Machine. The Turing Machine (the developmental process) reads in part of the input Tape (DNA sequence), then determines the needed output (chemical signal) to the output tape (the organism or structure), sets its new state, and moves the input head to the next section of the input.

In this view, the bulk of the actual functional "code" exists within the development environment for the organism while the DNA acts as a simple string of characters that the environment uses as switches to determine output and set its state.

This model is quite capable of showing why a seemingly complex being like a human is generated by only "800 MB of data". 800 MB of switching instructions for a larger program is capable of doing a ridiculous amount of work. A few bytes of data in MS Word (input from clicking on save) is all the program needs to write all of my data from active memory into a formated file on the hard drive (which also involves memory addressing routines and hardware interrupts).

A computer program contains the operators that work on data from the environment. The genome does not. The genome is part of the data that the cytoplasm manipulates.

And this too is not unheard of in computer programming. There are programs that can operate on and modify their own code. There are asyncrhronous computer designs that do not depend on a universal clock and sequential execution. What the cell does with DNA is essentially a form a computation; converting triplets into amino acids into proteins that enable or disable other parts of the sequence. The analogy is not exact but it is pretty good as long as you do not try to extend it to far. Especially when you consider viruses. What are they execpt pure "code"? They do nothing on their own, except bind to cells and inject the DNA so as to "reprogam" the cell to produce more of the virus? That alone is a very compelling argument for the analogy.

But even so, whether you accept the analogy or not, Stimpson is still wrong in his analysis on just the computer science side of the argument. The size of a program has little or no relation to the complexity of what it can produce. Take his MSWord example. I think to judge its complexity you would have to compare its size to the total of all the Word documents produced using it. Look at the Mandelbrot set, already mentioned. Very small program can produce an image of unimaginable complexity. Look at IFS image encoding that essentially "encodes" an image as a program rather than as just compressed data. These can produce incredibly small representations of complex images.

Here's a couple of brief wikipedia articles that may be helpful.

An article on dataflow architectures:

http://en.wikipedia.org/wiki/Dataflow_architecture

an article on production systems:

http://en.wikipedia.org/wiki/Production_system

The latter article is oriented toward AI systems running on serial computers, but imagine massively parallel execution where:

1. the cytoplasm is the "working memory",
2. most genes are rules,
3. gene transcription is rule firing, and
4. intermediate products from previous rule firings enable or disable transcription, and
5. there's no serial bottleneck---any rule may fire when its preconditions are met. (That's actually closer to the original idea of production systems, and to a lot of their current uses for modeling.)

Ah, Mr Stimpson, you've graced us with your presence.

You said: I think it's more probable that the human DNA which we have discovered so far doesn't contain all the information required to produce humans.

Your problem is that you don't understand the nature of DNA, and have no clue as to how it works. There are valid analogies to be made with electronic computer systems, and with programming code, and you made all the invalid ones. As PZ said, you've managed to prove that the bumble bee can't fly.

It's true that I believe that DNA contains information that specifies limb formation and higher level activities like tissue repair and vision. I am little suprised that you don't. I am wondering what percentages of biologists or people reading this blog would agree with you on those points.

Most. Because they aren't "specified". That is your fundamental misunderstanding.

When you said: I wouldn't be suprised if more DNA, or some other kind of information, is discovered some time in the future. you made yourself look ridiculous, because you are so completly unaware of the "other information" that we've already found, and which PZ *teaches* and *experiments with* and *writes about* on this blog.

This particular blog is full of posts which talk about the process of "building an organism" in great detail. You might try reading some of them.

We aren't gritting on you for being unaware of how DNA works. We are gritting on you for mistaking your ignorance for profound insight. Ignorance is nothing to be proud of.

#136:

It's true that I believe that DNA contains information that specifies limb formation and higher level activities like tissue repair and vision. I am little suprised that you don't. I am wondering what percentages of biologists or people reading this blog would agree with you on those points.

This isn't a matter of belief.

DNA contains information that specifies proteins, and sometimes non-coding modifiers like microRNAs. The interactions of these proteins with each other and with DNA, along with inputs from the environment, result in the formation of limbs and in tissue repair and vision, but these higher-level outcomes are not hard-coded in the DNA, and it is simply incorrect to say that they are.

Derek James wrote:

Dawkins ... does hold to the idea that DNA is analogous to computer code. Does this make him a crank?

No. It just means that metaphors and analogies are very inefficient and error prone ways to communicate information. Sometimes they work beautifully and you get the point, other times you can totally miss it and assume the wrong connections. Dawkins metaphor doesn't include anything about what Randy Stimpson was assuming: that DNA really does contain typical computer style information. As others have pointed out here, the information style for DNA (and its interpretive systems) is exotic; fractals, L-systems, self-modifying code and not typical of how humans write and process computer programming.

A genetic circuit is an excellent example of what I'm talking about! They are utterly dependent on cell signaling and signal transduction for their function.

And computer programs are utterly dependent on the computer's electronic "signal transduction" for their function!

Maybe I just don't get it, but this still sounds like a bit of a strawman argument against the computer analogy, as T_U_T points out in #148. Although there may be other reasons why the analogy is poor.

I feel this quote sums it up:

"The genetic code does not, and cannot, specify the nature and position of every capillary in the body or every neuron in the brain. What it can do is describe the underlying fractal pattern which creates them."

The code to the pattern that creates it is far simpler then the whole of the item.

Draw a line 1 inch, turn left, draw a line 1 inch, turn right (x) times... (making up randomly, very simple example) and then repeat a million times. The object created would seem far more detailed then the code used to create it. Such as with genes.

Wonder what it's like to be a creationist... to always be so sure of yourself, even though you're so wrong. To stand on the back of technological innovation and preach of it's failure, to rely on the minds of others while insulting them and feeling better for it.

Their childrens children will be punished, while the world marches ahead without us.

It's true that I believe that DNA contains information that specifies limb formation and higher level activities like tissue repair and vision. I am little suprised that you don't.

Of course he doesn't, because the basics of limb formation have been figured out. You can read it all in textbooks today.

What happens in vertebrate limb formation is that a gene involved in all outgrowths of the body wall gets active, and then the same genes that make the head-to-tail axis make the shoulder/hip-to-fingers/toes axis. There is no separate gene for limbs!

I forgot how the places on the body where the limbs grow are specified, but I suppose it's simply just behind the gill slits respectively just in front of the anus.

Go read it up. Here on this blog you can find several posts that are very good introductions.

Digital wrote:

Their childrens children will be punished, while the world marches ahead without us.

Atheism may be economically important:
http://normdoering.blogspot.com/2008/02/religion-as-force-for-ignorance…

#46 - well, it really isn't O(n) any more than bucket sort is O(n) in the general case.

The 'algorithm' is in physical space, but requires a physically impossible O(1) step repeated N times, i.e. magically select the tallest piece of spaghetti. The answer 'eyeball it' is O(n), for a total of O(n^2). The answer 'build a physical machine to do it' reduces to bucket sort due to tolerance issues. The answer 'build a perfect machine to do it' either kills the premise, or leads to the class of N-cpu algoirthms (which CS agrees are O(n), but outside the scope of the problem.)

I hope this hasn't just degenerated into an argument about whether or not DNA is like a computer programme. (And completely ignoring the errors made by Randy in the first place, and pointed out by PZ, whch render his back-of-a-napkin calculation fallacious and incorrect.)

Well it isn't a very worthwhile conclusion that DNA is like a computer prgramme (if you're dealing with an entire organism), because however far you take the analogy, it still falls short. The genome is a lot more complicated than simply the DNA that encodes it, and if you want to describe its functions as being like software, then at the very least you're going to have to concede that it is unlike any software which currently exists. (So, what is the point, other than as a descriptive analogy?)

Also, it simply ignores the fact that the DNA component of the genome - taken alone - is simply a relatively inert molecule (which is part of the reason that it is such a useful way to store a genome - it is a relatively stable biochemical). The DNA is a data input for the cell, not some kind of magic-hat for producing new cells; without transcriptional and translational machinery it isn't even really information (other than in the most general sense) - it is just a complex molecule.

But, it simply isn't possible to isolate the DNA as a set of instructions, and to say that transcriptional and translational machinery is hardware. The transcriptional and translational machinery is sometimes hardware, but also sometimes instructions (and sometimes, both). Equally, DNA and RNA can be instructions, but can also be functional. The analogy simply doesn't hold beyond the superficial, because the reality is much more complex.

Indeed, many aspects of biochemistry and cell behaviour are emergent functions - as PZ pointed out - which don't even exist in any meaningful sense within the genome. Critical responses and feedback loops may often only be understood as properties of the system, and can't be found within this supposed operating system. So the analogy fails again, because it assumes that the genome will provide all of the information required for building a cell, which is clearly not the case.

Possibly, in protobiotic systems, this analogy is more useful, because the input/output elements of the system may be separable. In any extant organism, this certainly isn't the case.

As we found upon sequencing the first genomes; the genome - by itself - actually tells us very little about the organism. And this is not simply because we lacked the knowledge to interpret it. There is very little relationship between the complexity of the genome and complexity of the organism, for instance. (Why potatoes, why?) Even a primative post-genomic approach tells us very little, because things like transcriptomics can be misleading, and equally proteomics and glycomics. It is only via a synthesis of all of these disciplines, and knowing that each element of cell biology can provide information and function, that we have really started to understand how the genome works.

Whilst software has an abstract existance (independent of the hardware required to run it), the genome simply does not (even though DNA does).

So, is the genome like software? Yes. Is a car like an aeroplane? Yes.

Whether or not a cell, or the brain, "is a computer" depends entirely on definitions, of course.

For our purposes, that is to say, with respect to what Stimpson brought up, the cell is not a computer, and the genome is not computer code. Bringing up parallel processors and the like is beside the point, because by all appearances, Randy was writing about familiar linear processing binary computers.

Of course we can compare cellular information processing and the processes occurring on a CPU, but the former are rather more messy and unformalized, with co-opted and dual or greater multiples of functions being quite common. We didn't even know about gene silencing by RNA interference until about 10 years ago. There are "layers upon layers" of information interactions in cells and in bodies, except that they're not so much "layers" as tangles of complexity which we have difficulty sorting into hierarchies, but often do for our own understanding.

It's all almost as if evolution produced the cell, designers the computers.

The fact is that Randy's actually not the first person to think that the number of genes humans have seems insufficient for the complexity of humans and their ability to interact with the environment, and almost all estimates of the numbers of human genes ranged above the numbers known. On the other hand, Randy's aiming at a strawman, since it's well-accepted that a substantial minority of "junk DNA" is involved in regulation (and such a role was guessed at long ago, but, unlike creationists, biologists don't usually make claims without evidence).

Yet even without the regulatory functions of the "junk DNA," the complexity of information interactions within the body would make the relevance of Randy's calculations questionable. Randy seems to be thinking that the information that we get out is the same amount as the information that went in, which is ridiculous. There is an enormous amount of interactivity between the developing human and its environment, plus "developmental programs" take advantage of self-ordering processes that human-made computers have barely begun to notice.

The whole thing comes down to a radically simplified set of instructions for making a human, mainly because the IDists got one thing right, evolving a new gene is not very easy to do. That's why we don't have that many more genes than the fruit flies (that, and we had a common ancestor with not too many genes). Thanks, creationists, once again you've pointed to fulfilled evolutionary prediction (prediction possible once we have enough biochemical evidence, that is), and to a self-ordering kind of development that would be expected from evolution, versus the top-down ordering more readily gained from designers and computer programmers.

Glen D
http://tinyurl.com/2kxyc7

Critical responses and feedback loops may often only be understood as properties of the system, and can't be found within this supposed operating system. So the analogy fails again, because it assumes that the genome will provide all of the information required for building a cell, which is clearly not the case.

So? The operating system doesn't provide all of the information required to build a new computer. (If the analogy is so bad I don't see why strawmen have to be erected to disprove it.)

There is very little relationship between the complexity of the genome and complexity of the organism, for instance.

Define 'complexity of the genome' and 'complexity of the organism'.

Having not read any of the other comments yet (disclaimer in case someone has already brought this up), this clown should learn a little about iterative fractal algorithms.

While not a perfect analogy to the relationship between genotype and phenotype, it at least highlights the fact that there isn't some lower boundary which is true of all possible sets to the minimum size of the algorithm or its seed values that will produce a set with a given size and apparent complexity.

Also he seems to make no distinction between the development of the organism and how it functions in its adult form. Is he talking about how much "code" is required to _grow_ the organism or how much "code" is required to _run_ it? I', fairly certain the reason this is not clear is because he hasn't even considered the distinction.

Speaking as an analyst/developer, I'd say this guy has pretty poor analytical skills. Probably the kind of programmer who hacked his way to competence and now relies on tried-and-tested patterns, rather than insight, to get the job done. I've worked with a lot of the breed.

Perhaps this clarification will help reconcile me to all the computer people who object to my dismissal.

We can talk about the cell as an analog to a computer program; there's an interesting literature treating development as a trajectory followed by a swarm of state machines. I can buy that.

What I specifically object to here is the strange isolation of DNA as "the program," an excessively reductionist view. It isn't and can't contain a program. The information there is inadequate -- we tend to ignore all the information that is also inherited from the cytoplasm, and the fact that there is epigenetic modification of the genome in the history of a developmental lineage. That's also part of the "program," and it's just as indispensable as the nucleotide sequence.

Another thing to consider: DNA came last, protein/RNA/metabolic intermediates came first. The predecessors to the progenote did all of the things we characterize as the product of a program, without a genome.

Atlas should shrug

Seems misconceptions are hard to get rid of, and even the great PZ himself is not immune... So.

Again.

No real program is completely self sufficient. All rely on libraries bios or os functions, hardware configuration, lots of them need additional information to work properly.
So, this is NO valid reason to disqualify DNA as a program.

Of course, DNA is wildly dissimilar to any human programming language, but that does not mean it is not a program ( in the most abstract sejnse).

Och, and, also... many of (early) computer functions were once performed by single function hardware, so what...

I met this guy over in the Brights forums and communicated by email. He had done some numerology on the size of humans and worked out that if all the axons had a constant conduction speed then by timing signals from the brain to the toes he could set up a harmonic and he claimed that this was the basis for consciousness. He wasn't a creationist, just someone with a lot of practical engineering type knowledge who is also very numerate. So instead of doing the hard work and learning about biology they think they solve it all using maths and logic.

I tried to point out to him basic facts, like axons come in different diameters and this affects their conduction velocity, also the small problem of there being synapses in the way, especially in the afferent side of the loop. But he was not interested in grubby facts interfering with this beautiful idea, if Nature didn't work that way, then so much the worse for Nature. I had to tell him the correspondence was at an end if he was not open to reality.

T_U_T:

They're not saying to dismiss it as a way of simplifying the concept, they're saying not to assume the VERY VERY VERY VERY VERY simplified version of it (calling it a program) is exactly accurate to what is actually happening.

It's a discription for the underlying fractal pattern, not a 'this does this if/then statement'. Saying "It's a program" is accurate, but saying a monkeys behavor can be simplified to "IF hungry, GET food, THEN poop" is a horrible understatement.

PZ wrote:

What I specifically object to here is the strange isolation of DNA as "the program," an excessively reductionist view.

Just how excessively reductionist is it?

Years ago, back in the 1970s I think, someone on TV said it might be possible to use cow's eggs instead of human eggs (and I think cow's wombs also) to clone human beings. (Has that been disproved?) If that were possible then the essential difference between a human being and a cow could be said to be "the DNA program." No? (That's assuming that it's only the DNA that matters when they do the "Nuclear Transfer" into the cow's egg.)

I suppose that question reveals a lot of my ignorance. But you seem to think the "DNA is like a computer program" idea is even more wrong than I had previously thought.

And what about Craig Venter? He sometimes talks like that DNA=program metaphor is real:
http://www.edge.org/3rd_culture/venter.dimbleby07/venter.dimbleby07_ind…

Just how much can he hope to accomplish by re-writing an organism's DNA?

the only VERY VERY VERY VERY VERY simplified thing around, is some people's concept of what counts as a program.
Gosh. people, you really seem to think that it is a program only if it looks like a program for your PC. Go and learn something geeky http://en.wikipedia.org/wiki/Esoteric_programming_language

What's up with the accusations of vitriol?

Pointing out that someone has repeatedly said something stupid (therby drawing the inference of "idiot") is not vitriolic.

#170
In one sense, DNA is important in providing the information to the development environment to switch on the production of certain proteins and to send certain chemical cues (effectively changing the state by way of transitional functions by the view of a Finite State Machine). However, the DNA is not the Finite State Machine itself. The DNA does not read in any "symbols" and change a "state". It is the DNA that is read for the "symbols". The DNA is simply the input, not the program.

The example of using cow wombs and eggs with human DNA however, would break down on the fact that the environments are not the same, and would not respond with the same state/transition sets to the same input and would thus have different results, most likely resulting in a crash due to invalid input on a state (to drag the metaphor out).

Someone correct me if I'm horribly off on this, I've got the CS part in my brains pretty solid, but the biology isn't my forte.

All very interesting comments (why no comments on MY blog???), but a crucial concept has been totally overlooked (I think) - Stimpson has declared that the amount of "information" that he has tried to figure out in the human genome is not enough to 'code' for something as 'complex' as a human.

To make such a claim, does one not have to have:
1. figured out how 'complex' a human is

2.know how much 'information' would be required - in a biological system - to 'code' for such compelxity?

Stimpson is only now beginning to take the baby steps required to even attempt to address these issues, yet he had, some months ago, CONCLUDED that there just is not enough 'information' in a genome to do it.

A fairly common tactic in creationdom - remind sme a bit of Warren Bergerson, whom some of you may remember form the ARN forum. He ran around for YEARS claiming to have disproved Darwinism using actuarial math, and finally admitted to having never done any actual calculations, he just 'knew' that evolution couldn't be right, so he just 'knew' that the math would support his claim.

PZ: #165 is much better, but I still think it's rather curmudgeonly. It is possible to take the DNA-as-program in a very naive direction, but it is also possible to take it in an entirely reasonable direction.

In an effort to be less of an "ignorant computer scientist," I'm sitting in on a class with Eric Davidson. It's all about cis-regulatory elements and more generally gene regulatory networks in development and evolution (we just started on the evolution side last week! cool stuff!) He said, explicitly, in class "the information processing interpretation is fact, not metaphor." He also gave the reference to http://authors.library.caltech.edu/1164/ for details of analogies between GRNs and logic systems.

Although the "DNA is the program" attitude may be overstated, there are aspects of DNA that are qualitatively different from other information if one applies information science to cellular mechanisms: the DNA is non-volatile, and copied digitally with high fidelity in normal cell operations (copy failures are very interesting in studying evolution, of course, but they're relatively rare in the day-to-day operations of cells). There are dangers in taking this analogy too far, since the "computer" is not much like a Windows PC, so bringing up the size of microsoft word is a red herring for any number of reasons. But doing statistics on the information content of the DNA is quite valid, unless the claim is that there is a whole lot of "hidden" information in the cytoplasm or RNA inherited from the mother or the environment... and while there is important information there, I would argue that it's a lot more noisy and low-fidelity, and that the developmental systems driven by the DNA are actually rather robust in handling imprecisions in those domains.

The closest analogy in computer engineering I can think of is actually that the DNA is something like a specification for an asynchronous, digital VLSI chip, that, in the right environment, will produce the necessary stuff to bootstrap an adult organism that includes what's needed to make new copies of the chip. Of course, you need the appropriate supporting stuff for the bootstrapping process, just like having computer software on a CD isn't very useful if you don't have a computer. But to a theoretical computer scientist, an asynchronous VLSI chip or a cell's chemical kinetics is just another type of computer that acts on a string of digital information, and just knowing that, without the details, allows for some specific mathematical analysis.

#174
Ah so this is really just the argument from irreducible complexity (argument from incredulity v2.0) dressed up all pretty in computer science metaphor.

"Boy, humans sure to look complicated and DNA would just be too simple to make a person if there was Junk DNA. I can't think of any way that a small amount of DNA could ever make a whole person... therefor goddidit!"

Couple of notes to feed into the frenzy. 1. They are trying to work on more efficient processors that use code which is reversible. The idea being that, every time you switch the state of something, it uses energy, so its usually actually less costly to reverse the set of steps you already took, to an earlier state, than to make a large number of new state changes. Apparently people have written working programs in simulation with this system. I find it incomprehensible. 2. Processors already exist that are programmable, so that the internal logic of the processor can be rewritten to do something else, by the software *running* on the processor. While this is, so far, only used at startup to turn a machine from type X to Y, so it can run someone only made to work in Y, when you consider what you could do with it.. its a tad scary. One could literally use a series on instructions to rewrite other instructions, such that the next instruction will do someone completely different than it did 10 instructions earlier. Both of these give me nightmares, when I contemplate trying to figure out what the heck a program written to do such things is *supposed* to do, never mind how. lol

Anyway, we have gone over all the stuff, so I leave you with the following, for entertainment. Note, the program is the one I mentioned earlier, which in its prior form was only 1.53MB (and could still be, without the GUI), and since the animation file that defines the "start", "end" and "step by" for these images is more or less identical, they don't include them (they would be only about 4 lines, specifying the .pov file to use and the number of frames to animate).

Short code contest #5 (animations):

http://local.wasp.uwa.edu.au/~pbourke/exhibition/scc5/final.html

Got to love recursive macros, though, some of them where kind of lame this time around.

Your preconceptions are not data.

You're slaying me with these perfect nuggets of quotables. This comes close to topping "Your ignorance is not evidence."

I understand PZ's point, that the information in DNA is expressed through and requires the cell (cytoplasm, nucleoplasm, etc). But the cell is self-assembling--put a human nucleus in a mouse cell, let it divide a dozen times, (maybe a hundred times), and now you have completely human cell. Cloned animals have epigenetic-derived defects, but I expect their Nth generation offspring will be normal.

Let me push the argument further taking a bacterial cell as the model. In principle, you could express proteins in vitro and combine them with lipids, small molecules, and DNA and reconstitute a cell. It wouldn't be quite right, but get it close enough that it can divide, let it do so a bunch of times and then the cell will be completely normal.

But which proteins would you express and how would you figure out how to combine them? In principle, you could predict from the DNA sequence the set expressed in a particular environment, relative expression levels, and where they go--membrane, cytoplasm, etc.

True enough, DNA without its cellular environment is not a cell, and in biological systems the DNA is always associated with its cell, but the epigenetic information is mainly derived from the DNA and secondary to it. For the biologist this distinction is meaningless--practically we can't yet predict epigenetic context from DNA or recreate it from scratch. Biologists describe the epigenetic state, observe it and assay for it.

PZ, having made the effort to read more of the comments, and coming from a position of relative ignorance in biology, I'm a little confused about your take on which is program and which is computer if the analogy is admitted.

At one point you say "if there is a computation engine in the cell, its the cytoplasm".

In a Turing machine, the computation engine isn't the program. The program is the stored data the computation engine reads its instructions from and writes to. The program is data, its not the computing mechanism.

So it seems to me that the cytoplasm is the computer and the DNA the program if the analogy is admitted, not vice-versa. Or am I missing something?

Farren you have to careful not to confuse the fact that we can describe the regulatory interactions in the cell as algorithms to mean that they actually function as algorithms in any meaningful way. IOW don't confuse the metaphor with reality. The metaphors for life, and consciousness in particular reflect changing technology over time. In the computer age the wetware between our ears becomes a massively parallel analog computer and DNA becomes a program. They both give a flavour of what is going on, but as explanations they are effectively useless.

Farren Hayden asked:

So it seems to me that the cytoplasm is the computer and the DNA the program if the analogy is admitted, not vice-versa. Or am I missing something?

Yes, from my reading of PZ and HumanisticJones, it seems you are missing this: DNA isn't as flexible as a computer program, it makes too limited an input to be like a computer program. Some of the people commenting preferred to think of the DNA as input into a program rather than the program. HumanisticJones, rightly or wrongly, thinks that if you put a human genome into a cow's egg and womb the "system would crash" perhaps like trying to read a jpeg image file in a primitive text processor.

DNA isn't as flexible as a computer program, it makes too limited an input to be like a computer program.

the very first google hit proves you wrong ;)
http://www.dna.caltech.edu/Papers/dimacs.ps

the very first google hit proves you wrong ;)

My computer and programs cannot read that .ps file. It tried to open it with CorelDraw.

What is it?

http://www.dna.caltech.edu/Papers/dimacs.pdf

a way to make dna to work exactly like the turing machine

My computer and programs cannot read that .ps file. It tried to open it with CorelDraw.

What is it?

Postscript.

It's usually used for professional printing. If you have a postcript printer, you may be able to send the file to the printer without opening it.

a way to make dna to work exactly like the turing machine

Okay, the DNA is flexible enough. I was wrong. But the rest of the natural cellular systems aren't a Turing machine -- even though it now appears you could make one with the parts there.

.ps is an old format called "postscript". Its basically like a PDF or HTML, but was used about 10 years or so ago on laser printers. Unlike PDF and HTML, it **can** function as a computer program language, and some people had fun generating fractals and other complex things, by feeding the .ps file to a printer, which then "ran" the program, to produce the final image. Its a lot more flexible than many other methods that are used, but the printer needed to do it required a lot more smarts than most of them have, even today (or especially today, maybe). Without a plugin, etc., to view it, you probably can't on a PC. Linux still has stuff floating around for it, and uses it for stuff, so people with that probably can read them.

PZ,

Your latest clarification leaves me more puzzled than ever.

A typical program can't act as a program without various auxiliary machinery, code, and data. Most "application programs" assume the existence of an underlying operating system, dynamically linked libraries of code and data, and instruction implementations that may be implemented in hardware, and some of which are usually implemented in software (by traps into the operating system code). A computing environment is full of endosymbionts that are arguably "part of the program" and arguably not.

It seems to me that by your criteria, almost all typical everyday computer programs are not actually programs. That can't be right.

It can be an interesting exercise to try to define "the whole program." For example, a typical application program isn't "a whole program" in a certain sense---it's just a subroutine that's loaded into the operating system, and the OS and all the apps are one big program... but so what? Exactly where you draw the boundaries around a "program" is largely a matter of analytical convenience. For some purposes you need to analyze a program together with its libraries and os functions, and even with concurrently running applications on other computers, but for others you don't.

Thinking that you don't have to analyze programs in context---or that there's a single boundary of "the program"---is an indicator that you have an excessively reductionist view of computer programs.

The more biology I learn, the more it looks familiar in fundamental computer science terms, even if it doesn't look like the most familiar manifestations of computer science.

Consider plasmid exchange. That reminds me of loading device drivers into OS's, and plugging plug-ins into apps like browsers. (Just to pick something most people here are familiar with.) Sure, there are differences there, but everyday programs do wacky things that modify themselves, adapt to their environments, etc. The admittedly numerous differences don't seem relevant to whether one thing or another "is a program."

As for the late evolution of DNA... again I'm having a hard time finding a fundamental criterion there.

I'm not really sure what you're getting at, talking about the nongenetic ancestors prior to DNA, but

I suspect that the prior nongenetic autocatalytic goo can be usefully viewed as a prior program used in bootstrapping the later one. (Both evolutionarily and developmentally.) Just because it's not genetic doesn't mean it isn't a largely computational mechanism. (A partly analog and partly digital system for regulating and perpetuating itself, with catalysts acting as switches and reasonably called instructions.)

Whether to call that goo an instance of a program is one question, and whether the DNA "program" that hijacked it is actually a program is another. The fact that the latter may still depend on the former shouldn't disqualify it as a program. (Even if we decide not to call the former a "program.")

A .ps file *is* a computer program. The image is what you get when you execute it - IIRC it's a dialect of Prolog.

The 'Argument from Microsoft' is probably one of the dumbest things I've seen. However, I'm almost certainly missing something here: leaving aside Stimpson's inane insanity, I don't see how DNA isn't in effect a component within a computation engine.

As far as I understand it, DNA is a complex molecule that when exposed to the right environment 'causes' a sequence of repeatable events governed by knowable rules. I appreciate that it's not always, shall we say 'personally involved' in every decision - but there is a chain of events that ultimately stem from what is or isn't in the DNA.

A vastly complex chain of events governed by an insanely complex set of rules.

So far so good?

It's not the same as source code or even machine code - it's not all loops and if-then-else - but it is an 'instruction' carrying medium. It doesn't contain everything - some of the 'instruction' is in the rules, some in the environment - but the DNA is point that makes the difference, it's the thing that is varied.

It's the sort of system that no intelligence would or could ever create. It's the sort of system that could *only* ever result from a complex, iterative, self-modifying and externally selected system.

At least, thats my understanding what DNA is/does - am I spectacularly wrong?

#191: Postscript is approximately a variant of FORTH, it's not much like Prolog at all.

#190: good stuff; I think I mentioned plasmids in a draft of an earlier post, but didn't. Your mention got me thinking, though: it's known to be possible to stick mouse DNA into plasmids to have E. coli make mouse proteins, and that sort of stuff. This seems very analogous to running the "program" or at least "subroutine" on "different hardware": the cytoplasm, ribosomes, environment, and other cellular machinery is different in E. coli than a mouse, but it's close enough to work for some things. So at least some parts of the "DNA program" clearly don't depend so much on the exact cytoplasmic "hardware" and environmental conditions. Of course, even if you could stick the whole mouse genome into a E. coli, you wouldn't get a mouse... but the stuff that's missing seems more "hardware" than "software," in that it's mostly mechanism rather than digital data.

There are two very simple points that seem to have been missed.Firstly: MSWord is code. DNA is data.Secondly: data contains no information.
Code describes (with an arbitrary degree of redundancy) a context (a method or process). Data describes (with a different, arbitrary degree of redundancy), *nothing*.

To somebody who has not have programmed outside an imperative, functional language, I can see how you think this might be true, but it's not.

A generative awk script like:

ps -wuax |awk '/nfsd/ { printf("kill -9 %s\n",$2) }'|sh

proves that data can be both data and code.
Without the shell invocation it merely produces output ("data") like this:

kill -9 325
kill -9 327
kill -9 328
kill -9 329
kill -9 330
kill -9 331
kill -9 332
kill -9 6106

The sh pipeline at the end runs the "data" above through the shell as executable statements. "inactive" text into running code.

PZ, I maintain that your concept of program isn't broad enough. We're disagreeing about the definition of program, not about any biology. Sadly, so many scientific discussions turn into linguistic pissing matches. This is why I like math.

Your argument also seems to be logically flawed. Now you are saying that the genome doesn't explicitly state all cellular function; THEREFORE, the genome can't be a program. This strikes me as a pretty clear false dichotomy. It isn't necessary for the genome to intimately control every cellular process in order to be a program; it's only necessary that the genome be encoded with certain information that enable it to respond to specific inputs with specific outputs.

This isn't a perfect analogy, but I'll throw it out there anyway: I'm not claiming that the genome is an operating system; I'm claiming that it's a program.

You failed to address one of my key points. If you deny that the genome is a program, then aren't you denying one of the fundamental ideas of synthetic biology?

A nice little list of many different kinds of programming (and programs): http://en.wikipedia.org/wiki/Programming_paradigm.

This is a more germane version of Istrail & Davidson, and the one I was actually looking for in an earlier post:

http://www.cs.brown.edu/research/pubs/pdfs/2007/Istrail-2007-RGC.pdf

abstract:

The definitive feature of the many thousand cis-regulatory control modules in an animal genome is their information processing capability. These modules are "wired" together in large networks that control major processes such as development; they constitute "genomic computers." Each control module receives multiple inputs in the form of the incident transcription factors which bind to them. The functions they execute upon these inputs can be reduced to basic AND, OR and NOT logic functions, which are also the unit logic functions of electronic computers. Here we consider the operating principles of the genomic computer, the product of evolution, in comparison to those of electronic computers. For example, in the genomic computer intra-machine communication occurs by means of diffusion (of transcription factors), while in electronic computers it occurs by electron transit along pre-organized wires. There follow fundamental differences in design principle in respect to the meaning of time, speed, multiplicity of processors, memory, robustness of computation and hardware and software. The genomic computer controls spatial gene expression in the development of the body plan, and its appearance in remote evolutionary time must be considered to have been a foundingrequirement for animal grade life.

My cookie recipe for chinese chews is less than one hundred bytes. Yet the end product is highly complex. Obviously, using this guys logic, there is some information hidden in the cookie somewhere I don't know about. ;) Yet I know they are not fortune cookies.

Re: Paco's FPGA example - they evolved a 'program' to do speech recognition on, IIRC, "Yes" and "No", or something like. The result had a very high success rate, but only on the one chip, and only in a certain temperature range. Ya know, just like how cells have an optimal temperature range? :)

But yeah, it did what no programmer could have done in the space available. I think they did further experiments with multiple FPGAs and got a more general program, but it took a lot more generations. Really reaching for memories here though.

Looks like I've been scooped in spirit, but I'll say it anyway. Gary Marcus gave me what felt like a revelation when he talked about genes as IF-THEN rules, what's known in CS/AI as a production rules system, like ACME or Prolog. I'd still had a picture of genes just coding proteins, statically, even though at a different level I knew that had to be wrong. Marcus emphasized every gene having promoter regions controlling its production, based on what molecules bound to the promoter. E.g. IF lactose present AND NOT glucose present THEN make lactase. When I think of the cell as 30,000 different production rules, doing different things naturally based on local conditions, the complexity -- and potential for differentiation -- becomes a lot clearer. 210 cell types and navigation in a 3D coordinate system laid down by chemical gradients? Trivial!

http://bostonreview.net/BR28.6/marcus.html

PaulW:

Consider plasmid exchange. That reminds me of loading device drivers into OS's, and plugging plug-ins into apps like browsers. (Just to pick something most people here are familiar with.) Sure, there are differences there, but everyday programs do wacky things that modify themselves, adapt to their environments, etc. The admittedly numerous differences don't seem relevant to whether one thing or another "is a program."

You are making the mistake of thinking that an analogy is an explanation. Analogies help comprehension as they relate unfamiliar things/processes to familiar ones. DamienR.S. at #199 is a good case in point. However, if you then stop at that point you will never understand the system on its own terms. The idea is to use the analogy to bootstrap your mind into the logic of the system, once there you must proceed further until you can shuck off the analogy because you understand the system properly. The analogy is then useful only to help others bootstrap their way in.

PZ's point is that the analogy is useful as an aid to understanding, but it does not confer proper understanding. Remember that there is truth in the old adage that a little bit of knowledge is a dangerous thing. So stopping at the top of the bootstrap makes you think you understand the system, but you don't. That is the difference between scientists who actually study this stuff and the rest of the world. PZ doesn't need the analogy any more and having seen the problems that large analogies can produce probably hesitates to use them in teaching.

PZ's point is that the analogy is useful as an aid to understanding, but it does not confer proper understanding.

That may be so, but saying that the genome can't be a program since it doesn't "specify or regulate much of anything without major contributions from its environment" is still wrong, because real computer programs don't do that either.

That is the difference between scientists who actually study this stuff and the rest of the world.

Oh for chrissakes. I guess no one who disagrees with PZ on this is a "real" scientist then.

Hmm. Interesting points. Lets get *real* simple with computers. Mind you, I don't know what the really truly old stuff, like 8088s did, but here is the Apple I:

Start at 0000, jump to some place above 8000, look for a tape drive by flipping switches to turn on the drive, then check for incoming data from there. As this is received, decode it into useful information. Jump to the first instruction in this data.

Apple II:

Start at 0000, jump to some place above 8000, turn on the drive motor, read, step, read, decode, step, read, etc. until you have a big block of GCR data (Group code recording, while PCs use... I don't remember). Feed this code through a process to get the "real" data and move it, byte by byte, to where you want it. If its a boot program, jump to the first instruction.

Now, lets be real clear here. BIOS isn't relevant. It was the OS which contained the code to take stuff stored in any other sector of the drive, read it, convert it, then store it. If you changed that code, it *lost* the ability to read any disks. Every OS, no matter which one, *had to* have this code in it. It was critical, since there was no "BIOS" to call to simply ask it to read sector 4 and put it at XXXX in memory. Extended memory required flipping switches, some of which changed which bank of memory was being executed, another that determined which bank was being written. You could execute code in the same "location" as you stored to, as long as both where in separate banks. Literally everything, other than the display chips automatic interpretation of data in key memory locations, was done via flipping an switch, including sound. The switches simply turned things on or off, and not even reading the initial boot block of a disk could be done without them. For example, to play a tone you did something like:

ldx #nn ' Length
ldy #nn ' Delay for tone
lda C010 ' Flip speaker on/off
dey
bne -05 ' Return to lda.
dex
bne -0A ' Return to ldy.

I even tried to decipher the boot system for a game disk one time. The "basic" code was the same as normal for reading, but there was some sort of extra code to reinterpret some of the raw binary data, so that the intentional errors on the disk could be corrected and the checksum would still come out correct. Without it, the standard decoder would error out and generate invalid data, so your couldn't read anything on the first game disk, so couldn't copy it.

Modern computers do damn near *everything*, from reading the data from a specific platter, sector and block, to playing a tone, all without direct intervention of the program code. Go back far enough and you either get something you have to hand flip switches on to program at all, or you get something where everything, including just bootstrapping the code, has to be done by hand. Heck, DOS had that same problem. When Gates and his budy showed up to show off what they wanted to sell they realized they had an OS, but no way to make the computer "load" it. They had to write a boot loader for it from scratch at the meeting, or at least that's what is said anyway. And, back then, even PCs loaded from tape.

Your argument PZ ignores, as others have pointed out, the broad definitions, in favor of nit picking about how things work "now", and therefore how DNA doesn't look like something we spent years separating the core of *needed* libraries, BIOS functions and simple tricks into hardware, where its all common between OSes, then also building an OS that has a lot of predefined behaviors, and libraries to extend them, etc., and only then writing the programs to sit on the top of the rest. But that just means all the sub programs are, from our perspective, hidden in the OS, the BIOS, the hardware, etc., instead of sitting in our lap, where we have to hand tweak them every time, just to make sure the timing is 100% correct to read one block of data from a floppy disk.

Windy where exactly did I talk about 'real' scientists? When you have finished reading into my texts things that are not there maybe we can discuss it. Until then....

#203: OK, I admit it was hyperbole, but your text does imply that those who disagree only "think they understand the system" and that "scientists who actually study this stuff" think differently.

As for "reading into texts things that are not there", nothing in the quote from Paul W implies that he thinks that this analogy is a sufficient explanation for the genome. A little less condescension please.

#192 - Sorry you're right. It is a variant of forth. Turns out I don't recall correctly.

Windy wrote:

#203: OK, I admit it was hyperbole, but your text does imply that those who disagree only "think they understand the system" and that "scientists who actually study this stuff" think differently.

Yes that is what i said, what of it? It is a trivial truism. PZ, myself and most of the people I have worked with have no need of the analogy to work with and understand the genome and its functions, we have moved beyond that and understand the system on its own terms. Yes, dna is an information store and as such it can be used like paper tape in a computation. To use that as evidence that it functions like that in the cell in any real sense is to confuse the medium for the function. Babbage made his difference engine from metals, the first electric computers used valves and the afforementioned paper tape. I worked with a guy during my PhD who used to program computers with punch cards, now we have solid state electronics and flash memory in place of hard drives. So the analogy is that you are trying to fetishise metal, paper tape, valves, punch cards, transistors and flash memory. You are also ignoring the fact that metal gears can be used in many things other than difference engines and ditto paper and card etc.

Yes that is what i said, what of it? It is a trivial truism. PZ, myself and most of the people I have worked with have no need of the analogy to work with and understand the genome and its functions, we have moved beyond that and understand the system on its own terms.

Trivial truism my ass. I'm a scientist who works with the genome, and I disagree with the ARGUMENTS PZ uses against the analogy. Whether anyone has need of the analogy has little to do with whether the arguments against it are valid.

Peter,

My immediate concern isn't whether "it's a computer" is a good analogy for teaching, or whether useful to you or PZ for guiding your research.

I'll grant that saying "it's a computer" can be dangerously misleading for the majority of folks, who don't seem to understand what constitutes a computer. It may be a misleading "analogy" if your idea of a computer is your PC.

Clearly, most biology students and professional biologists wouldn't know a parallel, asynchronous forward-chaining production system implemented as a microscopic molecular transcription system if it bit them on the ass.

Which it apparently does, and sure enough, they don't notice it. It doesn't look like a von Neumann machines they're familiar with, so it must not be "a computer."

My concern is the title of PZ's post. I think it's literally false.

Maybe I'm wrong, but PZ and others in the "it's not a program" camp have failed to demonstrate that. Instead we've gotten a list of red herrings that demonstrate that they don't understand what makes something "not a program."

Talking about whether something is a program as "an analogy," as you do, often misses a fundamental point. Something that is "analogous to" a computer IS a computer. A von Neumann machine is only "analogous to" a Turing machine or a Post production system. They don't look at all alike, and yet they're equivalent in a certain deep way. An those don't exhaust the possibilities of "computers." Computers don't have to be digital or universal, and can be grotesquely ad hoc and kludgey. (Nobody's actually built a universal Turing machine or equivalent anyway---just limited approximations. And actual computer implementations are usually vastly more complex than those abstractions. So maybe those are just "dangerous analogies" for understanding, say, PC's...? No, that can't be right.)

Computers can be hybrid, asynchronous, massively parallel, and implemented at the molecular level by conditional transcription. They don't have to be purely digital, and they don't have to be universal. Software can be reactive, replicating, approximate, noise-tolerant, and self-modifying, and dependent on regularities in both its local computational environment and the external world. It can have structure that emerges bottom up, both historically and developmentally. No problems there.

Maybe the genome isn't a computer program. Or maybe it's just not a PC application program.

If you say that you and PZ don't find the "analogy" useful, that's one thing. That's pretty plausible. You and PZ apparently a better understanding of how the genome processes information than you do of how "computers."

But if you flatly say "the genome is not a computer program," you're on my turf. I sincerely want to know what the genome is that is "not a computer program," and so far I haven't heard anything that's even plausible, much less persuasive.

If the claim is true, I'd be happily fascinated to hear how it's true. I just get tired of hearing the claim supported by bogosities, and condescended to because I "don't get it."

This thread took off without me, but I obviously have some unfinished business.

"You'll have to explain what is being mapped to what that you are saying is one-to-one. DNA to phenotype is not one-to-one, for example, because multiple codons code for the same amino acids."

But the paper suggests that representations under "hard" contraints (such as, the evaluation function has to be one to one and onto) have at best a negligible effect on search performance. What you seem to be arguing is that "loose" syntactical constraints, by this standard at least, confer the advantage of confining search space. What I'm arguing is that our knowledge from evolutionary computation suggests that such isn't the case.

"If by "more explicit structure" you mean more constraints on what unique states are allowed in the genotype, then sure, that would also tend to reduce the range of variation in the phenotype."

That is what I'm arguing. DNA doesn't match a computational formalism like lambda calc, and its expressive capabilities don't seem to be constrained very well. It would seem to invite the opposite conclusion of what you're arguing.

"Why is redundancy or non-function a sign of a doing a poor job? Redundancy (gene duplication) is important to the formation of new gene variants because it allows new variants to be tried without breaking a critical existing gene."

Maybe I'm just misunderstanding, but if you're arguing that the computational limitations of DNA (in comparison to formal models that are universal) assist in restricting variation, then this would seem to be contradicted by a genome that is mostly non-functional, and in which the overwhelming majority of mutations have no phenotypic effect. Not to mention things like somatic mutations, which can't be passed on to descendants. (I'm writing from memory here, so bio-peeps can feel free to correct me).

Also, someone pointed out to me (windy, I think it was, tho I'm not sure), assuming that the evolutionary advantage of junk-DNA is increased variation conflicts with some known evolutionary phenomena. For instance, smaller genomed organisms show more evolution.

"To me this feels like saying, "from my experience with long division and multiplication tables, use of mathematics for signal processing invites premature conclusions". Umm. Only if you restrict yourself to long division rather than bringing in Fourier."

Point taken. Maybe it makes sense to talk about the cell in general as being some sort of analog computing system or finite state automation. While that may be true, it doesn't seem to be what Stimpson was arguing for or PZ is necessarily arguing against.

"Nobody's actually built a universal Turing machine or equivalent anyway---just limited approximations. And actual computer implementations are usually vastly more complex than those abstractions. So maybe those are just "dangerous analogies" for understanding, say, PC's...? No, that can't be right."

I think that the primary difference is that Turing machines and von Nuemann machines and their various derivatives (RAM, PRAM, k-Tape Turing Machines, etc.) have a similarity that is of rigorous and theoretical significance (universality) versus simply being similar in colloquial and metaphorical ways. The genome can be said to be like the input on the input-tape of a k-tape Turing machine, but it isn't a rigorous comparison.

Nonetheless, I am willing to concede that the cell itself can be modeled as some sort of analog computing system.

Indeed, the whole developmental process is much more like evolution than like any kind of software telling the hardware what to do.

Actually, the boot process of a modern computer resembles a developmental process, as the computer starts with tiny programs that use only the features available on the original PC, then gradually runs more and more complex programs using features from more and more recent processors, until it arrives in protected mode with virtual memory configured. There are also all the details of adapting to its environmental conditions, so the resulting running operating system kernel is determined not by a single static program but is built dynamically by detecting hardware and modifying itself by loading and removing code to fit its environment.

Paul W:

Which it apparently does, and sure enough, they don't notice it. It doesn't look like a von Neumann machines they're familiar with, so it must not be "a computer."

I've not noticed anyone make that argument. Looks like a caricature to me.

My concern is the title of PZ's post. I think it's literally false.

Fine, got any actual evidence?

Maybe I'm wrong, but PZ and others in the "it's not a program" camp have failed to demonstrate that. Instead we've gotten a list of red herrings that demonstrate that they don't understand what makes something "not a program."

Hold on, it is up to those who posit something to prove it, not up to others to shoot down all and every random hypothesis, so don't try that one. You want to be taken seriously? then cite some evidence, and that dna can be used to do computations does not cut it, for reasons I have mentioned, it confuses the medium for the function.

In addition showing that something can be modelled does not mean that is how it works in reality. A case in point Ian Stewart found a set of algorithms that meant he could model on a computer how flocks of birds like starlings can move in unison in the sky. He claimed by doing so that he had found and explained how they did it. Except he hadn't , because his algorithms reduced what was happening between the eyes of a bird and its muscles to one term, thereby conflating a multitude of pathways into a single number so no explanation at all. He also made the mistake of assuming that his set of algorithms were the ones the birds were using.

JW:

Actually, the boot process of a modern computer resembles a developmental process, as the computer starts with tiny programs that use only the features available on the original PC, then gradually runs more and more complex programs using features from more and more recent processors, until it arrives in protected mode with virtual memory configured. There are also all the details of adapting to its environmental conditions, so the resulting running operating system kernel is determined not by a single static program but is built dynamically by detecting hardware and modifying itself by loading and removing code to fit its environment.

Except that during development the environment is just as dynamic as the dna 'program' and includes changes both in the cellular and physical environments. At the level of cells it is like you are running the same 'program' on an increasingly different suite of computers (different cell types), except that epigenetic effects like methylation change the suite of programs that can be run, unless cancer intervenes etc. So the 'analogy' rapidly breaks down in complexity and confusion. So it is much easier and indeed parsimonious to ditch the analogy since it is at this point simply obscuring what is actually happening and complicating the whole thing when you can just understand it without the analogy.

What is needed by the dna is a computer supporters is firstly to demonstrate that this is what is actually happening in any real sense and that would include showing that real, novel and useful insights can be made about the BIOLOGY by using that particular lens to view the biology through. I would love to see that demonstrated, so where is the paper?

got any actual evidence?

re-read this link http://www.dna.caltech.edu/Papers/dimacs.pdf

you can emulate turing machine with DNA and enzymes, so it IS turing eqivalejnt, and thus DNA is not like a program, it is a program

T_U_T

re-read this link http://www.dna.caltech.edu/Papers/dimacs.pdf
you can emulate turing machine with DNA and enzymes, so it IS turing eqivalejnt, and thus DNA is not like a program, it is a program

Sigh, that link proves my point admirably. Yes, you can use dna in an artificial setup to do computations. But to say that means that dna in a cell is a program is to confuse the medium for the function. Just like because I can use brass cogs to make a difference engine does not mean that all brass cogs function as programs, individually or in any or all assemblages. A while ago New Scientist had an article on toys made in ancient Egypt etc that used looped string to run simple programs. Does that mean that the string sitting looped in my drawer is a program? no.

Cite me a paper showing that dna functions as a computer in vivo during normal functioning and furthermore that viewing it as such gives us novel insights into the BIOLOGY that explain previously unexplained things and has real utility.

Peter

again Machinery of a cell is turing equivalent. Just plug the right DNA program in, and it will emulate any turing machine you like.
Brass cogs are not computer. But brass cogs arranged in such a manner that you can make them compute by just putting in the right punch cards are a computer.

even if they are usually fed with punch cards that tell them to do something different.

Oh, and... If there is a turing equivalent device that can take your looped string as input program, then yes, your piece of string is a program for it. it most likely does not something useful, but it is still a program

Clearly, DNA can be used simpy as a digital data storage molecule, and can be used in molecular computing.

Perhaps part of the problem here is that we are mixing terminology; DNA and genome are not equivalent. Without the rest of the cellular machinery, the DNA content of a cell is little more than a relatively inert complex molecule. The information contained in the genome is greater than simply the potential coding sequences of the DNA.

As PZ did point out, the whole information content of the cell acts much more like a computer programme as understood by we of limited computing skills, and of the kind apparently proposed by Randy in his original post (which was what all of this started about).

The point is that the analogy, as commonly understood, is that DNA is the be-all and end-all data input for the cell, and the rest of the cell is simply hardware for interpreting that input.

I'm perfectly willing to admit, though, that my ignorance of computer programming may mean that I'm unaware of software which emulates (by design, or otherwise) DNA functions in the cell. It doesn't change the fact that considering DNA to be like computer programmes (or, very actually, computer programmes depending where opinion falls on that one) is most often unhelpful, because it can lead to the kind of false analogies and assumptions that occur in the post which inspired PZ.

The meat of PZ's argument with respect to Randy's original post still stands.

So, the problem may well lie with the fact that most people using the analogy don't understand either genetics or computing. Therefore, it becomes much easier to teach people who want to properly understand genetics about the actual biology, and to state the proviso that the analogy shouldn't be taken literaly.

Or, maybe, I'm wrong.

"Clearly, DNA can be used simpy as a digital data storage molecule, and can be used in molecular computing."

"Perhaps part of the problem here is that we are mixing terminology; DNA and genome are not equivalent."

I think that this is correct. Including myself, there seems to be a rather careless conlfation of "DNA" and "genome". The possibility of constructing a universal computer with the molecular components of the cell doesn't mean that the genome itself is a computer program.

The possibility of constructing a universal computer with the molecular components of the cell doesn't mean that the genome itself is a computer program

this is exactly
The possibility of constructing a universal computer with the components of your PC doesn't mean that the OS itself is a computer program

The trouble with you biology folks is that you have no clue about how computers work deep inside.

Virtually any objection that has beejn raised here against DNA being a computer program can be without any change applied to most pc executable programs as well.

T_U_T: Careful, Tyler is computer folks :)

Looks like there is significant disagreement both among biology folks and among computer folks on whether the analogy is appropriate.

Without the rest of the cellular machinery, the DNA content of a cell is little more than a relatively inert complex molecule.

And a computer program (let's say one recorded on a CD) is spectacularly inert without computational machinery and energy input.

Those of you who think the genome is inert and passive compared to a typical computer program: can you get a computer to change into a different model by running a different OS in it for a while? ;)

WOW... NOW I am confused !
.
How can anyone who knows how computers work say such things. How can someone claim that a sequence of digital symbols, that potentially can make a device (the cell ) to emulate any turing machine, is not a program ?

I mean, that is the very definition of a program.

T_U_T,

maybe the problem is with "that potentially can make a device(the cell)".

I'm well over my head here, but I did want to cast my voice in with those who emphasize this point: That DNA could be used as digital data storage is not an equivalent assertion to "the genome acts as a computer program." (my amateur analogy: Monopoly money could be used as digital data storage, but that fact, in itself, doesn't show that a game of Monopoly "is a computer program.")

I'm enjoying the stimulating conversation. Lots to think about, from both sides.

and the problem is ? I do not see any, so, go on, elaborate on that a little..

Tyler,

Less-than-universally-powerful computers aren't just "colloquially" or "metaphorically" "like" computers, and can they be just as rigorously discussed as UTMs.

They are computers, and they can be just as rigorously discussed as UTM's.

For example, see Turing's original paper where he distinguishes between plain programmable computers and universal computers. Or check out the machines corresponding to the Chomsky hierarchy, or formalisms like Petri nets... there's LOTS of stuff in between non-computers and universal computers. If you want rigor, we could talk for a very long time about more refined taxonomies than the Chomsky hierarchy...

But whether a machine is programmable (and hence has programs that are rightly called programs) simply does not depend on whether the machine is universal.

That's exactly why it was a very interesting discovery that many programmable machines can in fact emulate each other. The main point is that it's bonus you get with some programmable computers but not others. That was established decisively 70 years ago.

As long as the machine is powerful enough to do what you need done, Turing universality just doesn't matter at all. What matters is ease of programming (evolvability) and efficiency for the stuff you need to do.

The genome "processor" seems to be at least an asynchronous forward-chaining production system with constant matching and crude (analog) scalars. That's plenty to qualify it as a computer, and the genome as a program, whether or not you could run PC programs on it.

It's also an awkward kind of thing for humans to program (ask me how I know) but it appears that's what evolution came up with and had to work with.

Maybe there's other stuff going on in there that somehow wouldn't count as programmed computing, but even so, the genome seems to be at least a rule-based program that bootstraps/implements some higher-level programs.

And if it's just not just a production system implementing higher level stuff that emerges from the interactions of low-level rule firings, it's probably a more powerful computer than it looks like at first glance. (Not a less powerful one that "wouldn't count.")

Maybe I'm just misunderstanding, but if you're arguing that the computational limitations of DNA (in comparison to formal models that are universal) assist in restricting variation, then this would seem to be contradicted by a genome that is mostly non-functional

Bad argument. I have written, not to long ago, code that looked a bit like this:

...bunch of stuff I can't move to any place else...
/* 500 lines of stuff I did move to some place else */
...more stuff I couldn't move ...
/* 20 line of stuff I did move.*/
/* 50 lines of stuff that didn't work, but I hope to fix one day */
...more stuff moved to some place else...
/* a few lines I don't need any more */
...stuff I need here any more, but also had to copy to 3 other places, because it was critical to the workings of the stuff I moved...

plugin 1 - First batch of moved stuff.
plugin 2 - Second batch of new stuff, plus the copied code.
plugin 3 - Last bit of moved stuff, and a copy of the duplicate code.
... etc.

In other words, the original file, which I had to use in the past, has huge junks of stuff that do nothing, and are marked as "skip this". The plugins contain both code I copied from the original file, as well as copies of the parts that would break it, if I didn't make duplicate copies between them, and *those* plugins also contain sections that where commented in/out for debugging, testing ideas, etc.

This is in VBScript/Lua, etc., where there is no "compiler" whose job is to throw out the bits that don't do anything. The "code" continues to contain all the useless stuff you leave in it, with only the working parts being executed, unlike a EXE, where all the "working" sections are taken, packed down into a simpler form, then fed into a file that doesn't contain all the, "Here are 50 lines of code I can't get to work, WTF!", type sections.

Again, the problem isn't that DNA isn't a program, its that your looking at an absurdly narrow definition of one. DNA is source code, which gets "parsed". Parsers run programs, but *do not* do so by first removing all the useless non-functional sections, they just jump around them. And, more to the point, in some implementations, they can insert, remove, suspend, restart, or otherwise do things based on on demand requirements. The same program I converted a mess of stuff to plugins for "also" allowed you to create a triggered event, which basically means that when its "environment" demands it, new code is inserted, executed, then removed again. While this creates memory leaks in some script languages... There is no reason why 100% of the script being executed in it couldn't be rewritten so it just sits there, until an event happens, which requires it to be inserted into the parser, executed, then removed. There is also no practical reason, other than synchronization issues, why it couldn't be changed so that some unspecified number of such simultaneous insertions took place, each executing separately, then removing themselves from the parser. Nor, if one wanted to do something that nuts, would it be impossible to basically insert code in the parser that was designed to parse other code, which was inserted into that by a different section of code. We don't do it because it would be a damn nightmare to debug and its not strictly necessary for the main host of this system to be doing 90 things simultaneously. But it could be done, if you where crazy enough.

Any argument someone can come up with for "why" DNA doesn't look like one ends up coming down to either, not knowing about conditions that already exist where "some" of its behavior is already duplicated in some computer system some place *or* based on the simple fact that no one in their right mind would try to code something that worked like DNA, because there just are not enough independent processes on a PC to replicate it, and no living person could likely debug the system to make sure it did what its supposed to. Neither of which are valid arguments against the analogy.

Well frell.. How the heck did that get mangled so badly? Sigh...

Kagehi, just because you can sort of model in a crude way some of the things that happen in a cell does not therefore mean that what happens in the cell is like your model. That argument is circular. There is a difference between a model and an explanation. Saying the genome is a program gets us precisely where in terms of extra, novel or a useful explanation of some function of it?

I see lots of stuff where computing gets something interesting but I see no new biology. Unless and until that comes then I see no point from a biology p.o.v. to pay any attention at all.

Paul W. wrote:

It's also an awkward kind of thing for humans to program (ask me how I know)

Let me guess. Does it have anything to do with the fact that in spite of having gene synthesizers for a couple decades we still don't have more vats of genetically engineered microbes making biofuels, miracle drugs, superfoods and other stuff?

Kagehi, just because you can sort of model in a crude way some of the things that happen in a cell does not therefore mean that what happens in the cell is like your model

of course not, that's why we actually test models for their ability to make correct predictions.

or did you forget that part?

it's the same with any model of any system, for that matter, from meteorology to the behavior of molecules in solution.

I can't count how many times modelers have produced things that were horrible at real-world prediction. that doesn't mean we ask them to stop modeling; models generate novel predictions that aren't always obvious when we view whole systems.

not saying you're wrong in this particular instance, but you shouldn't discourage modeling of biological systems simply because the models aren't "new biology".

In producing a model, the idea is to oversimplify the system to begin with, and still provide accurate predictions and explanatory power. Then you can add new variables, or change the variables, and see what you get.

Is a game of Monopoly a computer program? - It's the result players executing the rules of game within the environment of the board, money and various tokens. Part of the program's state at any moment is who owns the money.

I'd say it's certainly a form of computation.

If biology is a form of computation does it tell us anything about biology? I don't know, I'd hope so. Regardless, it might tell us interesting things about computing.

Except that during development the environment is just as dynamic as the dna 'program' and includes changes both in the cellular and physical environments.

This isn't a difference: the hardware environment of the operating system is dynamic as well. Each piece of modern hardware has its own code that is adapting to the OS at the same time that the OS is adapting to it.

Kagehi, just because you can sort of model in a crude way some of the things that happen in a cell does not therefore mean that what happens in the cell is like your model

of course not, that's why we actually test models for their ability to make correct predictions.
or did you forget that part?

Nope, but I am asking what predictions using the analogy that genomes are programs have given us? nobody seems able to cite research that does that. Therefore the model has no utility, it has no predictive power and no explanatory power about the biology. That is the point.

You could also ask what useful predictions we can get from the analogy that bicycles are vehicles... or teh analogy that giraffes are mammals.

nobody seems able to cite research that does that.

meh, that's not saying much, given that I rather doubt anybody commenting in this thread does much work in modeling biological systems.

I've never looked at the efficacy of programmatic models of DNA function myself (I'm a behavioral ecologist), but I wouldn't assume automatically that there haven't been efforts to do just that. In fact, I would assume the exact opposite. Probability is that someone, somewhere, has published an article detailing just such a model.

It indeed would be a good thing for supporters of this kind of model to spend a couple of hours pulling relevant cites from the literature to support its usage (or at least document it).

I know there have been a couple of threads over on Panda's Thumb in the last year or two that have gone into detail on similar ideas. I do recall a couple of actual modelers participating on those threads.

One of them started with some creobot's contention with Dawkin's "methinks it like a weasel" if that helps any.

Kagehi might even recall some of those threads.

aside from that, though, it shouldn't be too difficult for those who support this kind of model to search the lit for an example, right? I'm sure we can put the idiocy (Stimpson) that actually started this thread aside and look at whether there have been published efforts to utilize such a model.

Wouldn't that make everybody happy?

It at least would be more interesting.

T_U_T,

My argument wasn't meant to be the decisive blow in the argument over whether the genome could be a computer program, it was only noting the conflation going on regarding DNA computers. In particular, this is nothing like what I said:

"The possibility of constructing a universal computer with the components of your PC doesn't mean that the OS itself is a computer program."

It would be more like "the fact that you can fabricate semiconductors with silicone doesn't mean that something made out of silicone is a computer program.

Paul W.,

You're right that computing devices don't have to be universal to be rigorously described as computers. But what I was responding to was your implication that the differences between a UTM and an actual PC were analogous to the differences between the genome and a computer program. The qualities shared by the members of one of the sets being compared have theoretical significance, I'm not so sure about the other.

Since you mention the Chomsky hierarchy, one of the ways that this argument can be settled is through an analysis of the syntactic structure of DNA. Since I'm not a genomic biologist, I'm not aware of any analysis that shows that the genome has specific terminals, specific non-terminals, and specific production rules as per the Chomsky hierarchy. This may be what makes the difference between just being like a computer program and actually being equivalent to one in a formal way.

Granted, the genome certainly seems to function as an instruction set, but in that case recipes and blueprints can also be useful metaphors for the genome. And there is still the possibility, conceded earlier, of modeling the cell in toto as an analog computing system.

man, I had a ton of such articles. But it is on my other computer, and I am too tired to google all that again.

one try. I remember keywords : stochastic boolean networks

Lets see how good are my predictions :-)

Tyler, but if you can prove that that particular pieces of silicon can, if certain inputs are provided, act as a turing machine, you can say it is a computer and also you can say that the input sequence that controls it is a program.
the article I linked to has provejn that you can make a cell to emulate a turing machine just by feeding in the right DNA, so it is a computer.

Also, use of the Chomsky hierarchy was a suggestion. I'm aware of other formalisms for syntactic analysis.

"Tyler, but if you can prove that that particular pieces of silicon can, if certain inputs are provided, act as a turing machine, you can say it is a computer and also you can say that the input sequence that controls it is a program."

I gave the paper a quick read, so I may have missed it, but I'm wondering whether the restriction enzhymes that are so crucial to their model are endogenous. If not, my argument still stands.

It indeed would be a good thing for supporters of this kind of model to spend a couple of hours pulling relevant cites from the literature to support its usage (or at least document it).

First of all should we concede that Peter's demands on analogies are reasonable? How many publications are specifically due to the "genome as a cake recipe" analogy? And I seem to recall that a certain famous biologist a few years back proposed an analogy of the genome functioning as a Fourier transform. Should we ask him to denounce this worthless analogy that doesn't seem to have generated a flood of research? Or should we consider that analogies may have other functions?

That said:

----------------
The evolution of cellular computing: nature's solution to a computational problem

Laura F. Landweber and Lila Kari

How do cells and nature 'compute'? They read and 'rewrite' DNA all the time, by processes that modify sequences at the DNA or RNA level. In 1994, Adleman's elegant solution to a seven-city directed Hamiltonian path problem using DNA launched the new field of DNA computing, which in a few years has grown to international scope. However, unknown to this field, two ciliated protozoans of the genus Oxytricha had solved a potentially harder problem using DNA several million years earlier. The solution to this problem, which occurs during the process of gene unscrambling, represents one of nature's ingenious solutions to the problem of the creation of genes. RNA editing, which can also be viewed as a computational process, offers a second algorithm for the construction of functional genes from encrypted pieces of the genome.

Biosystems
Volume 52, Issues 1-3, October 1999, Pages 3-13
--------------------------

Should we ask him to denounce this worthless analogy that doesn't seem to have generated a flood of research?

I'd appreciate it if you didn't use one part of my comment to attack somebody else's comment.

If you wish to attack Peter, do so directly, thanks much.

My only purpose was to get to the literature on the subject itself, which is the only place where a debate on the issue would be worthwhile to focus on.

Peter did have a good point, in so much as models and analogies are often used without sufficient measure of their efficacy when looking at actual systems.

the only real way to address that point is by looking at the literature itself, and see whether such analogies and models have in fact been productive or not.

now, that said...

did you want to actually look at how computational models of DNA function are used, and whether they have had value?

that paper you cite seems to be a good place to start; there are lots of similar papers in that journal as well.

I'm not sure I'd be good at evaluating their efficacy; I'm a behavioral ecologist. You'd have to turn us on to some literature reviews on the subject as well.

Peter,

I'm at a bit of a loss as to how to respond to you, in a couple of ways.

I don't think the onus is on me to show that the genome is a computer program. PZ's the one flatly saying it's not, and then making arguments that don't hold water. I've already given some counterexamples to those particular arguments, and that should be enough.

It could be interesting to discuss what forms computer programs can take---I've already said a few things along those lines---and discuss whether genetic processing does count as programmed information processing, and what that does or doesn't imply...

...but it is definitely not my job to show that my paradigm is superior to yours in terms of research productivity. That's an interesting issue but irrelevant to the claim I'm defending---i.e., that PZ is likely wrong that the genome isn't basically *some*sort* of computer program.

In certain respects I think that saying that something is a computer program is far *less* interesting than PZ or you thinks.

Part of *my* point is that because many varied things do count as computer programs, saying that something *is* a computer program is saying much less than some people might think. Many things they assume about programs are unnecessary and insufficient conditions, so pointing out that something is a computer program CANNOT give you the kinds of precise and detailed predictions you seem to be looking for. (See T_U_T's comment #235)

It can only give you some hints, and some clues about possibly related work.

Once it's established (if it is) that the genome is a computer program (or mostly something very like one), there still remains the "small problem" of figuring out what particular kind of program it actually is, and what to do about it. Presumably, that'd be where you'd eventually find the predictive information.

Conversely, saying that something is *not* a computer program is a much bolder claim than it would appear to most people. If computer programs are many and varied, saying that something is *not* one implies a whole lot more things than if they're not. (For example, that the genome is not only not like a FORTRAN or Java program for a serial computer, but also not basically a rule set for a parallel forward-chaining production system.)

If in fact the genome isn't a computer program, the onus is on PZ to defend that very broad claim. (Which he doesn't need to trounce the creationist guy.) So far, he hasn't done a great job of that. He can assert it all he wants, but I'm not going to believe it until he (or somebody) gives me better reasons.

You seem to be arguing two very different points, i.e. that

1) the genome isn't a computer program (or even interestingly like one?), and

2) even if it is, it can't (or won't, or currently doesn't) tell you anything useful that you don't already know about biology without putting it in those particular terms.

You could be wrong about (1) and right about some form of (2). I suspect that you're wrong about both, but don't feel up to trying to convince anybody of (2) right now.

Even if I persuade you of the falsity of (1) you could turn around and tell me that you don't care unless I persuade you of the falsity of (2) as well, and that (1) just turns out to be a boring, surprisingly weak claim. I don't think it is, in the long run, but I'm not up for arguing about that.

For now, how about we stick to whether the genome is a program, and ignore whether it's a big boon to your research program to know that.

One problem I have with arguing about that is that most of PZ's reasons are so off-base I don't know where to start. The self-containment argument is just wrong.

On the positive side, I've suggested that the genome is (at least) like a rule set for a parallel forward-chaining production system.

If you want to argue about that, please tell me whether

(1) you don't think it's like that at all
(2) you think other mechanisms are doing most of the work, and the apparent rule system is a red herring, or
(3) yes, that's doing a lot of the grunt work, but doesn't give you the crucial power, and if so
(4) what kind of power that is and what you think might be doing it.

Specifically,

(a) do you think that the parallel production system model of gene activity is too computationally weak, and unable to account for the observed functions of DNA, or
(b) do you think it's too strong, so that it's not useful for predicting specifics of what the genome actually does, or
(c) something else... ? Is there something else besides computation and control that DNA is mainly doing?

It *seems* utterly obvious to *me* that what the genome is doing is at least largely computational. (But maybe I'm a computer weenie and total doof.)

I see a bunch of things acting like more or less simple switches, many of them boolean and many others not a lot more complicated than that, most of which only serve to switch other switches.

That, to me, is a hallmark of computation. The fact that it's under control of relationships between variable patterns (different genomes) is a hallmark of *programmed* computation.

Do you disagree?

That's not rhetorical question, BTW, and I'm sorry if it sounds naive. I honestly do not know whether you think that's not programmed computation, or whether you think that the programmed computation is real but just "not enough" in some sense to be basically what the gene/cell thing "is about," or what else you think might be going on that's more important.

Tyler,

> But the paper suggests that representations under "hard" constraints (such as, the evaluation function has to be one to one and onto) have at best a negligible effect on search performance.

I don't think I was proposing that DNA operates under hard constraints like this, so I don't see why this is relevant. Example: rare events can modify the codon "code" itself (mostly in ancient bacteria). There are many kinds of constraints.

>> constraints on what unique states are allowed in the genotype,... would also tend to reduce the range of variation in the phenotype."
> That is what I'm arguing.

I think we agree on this. Somehow I think you've got the impression my position was opposite what you think it is. =)

> DNA doesn't match a computational formalism like lambda calc,

I don't think you'll find any computational formalism that exactly matches what a cell or DNA does, because, at root, the actual "computer" is the massively parallel atomic laws of physics. Any computational system we build that lacks that level of detail will be nothing more than approximation of cellular behavior.

That said, we can certainly take a subset of cellular systems and model them or abstract their principles and see what their behavior is. See if it behaves in ways to similar biological systems. Make predictions about how biology should behave. These subsets and abstractions can be studied and formalized, and if these are doing computations, then the cell / DNA certainly is doing computations, whether one wants to hang a label of a "program" on it or not. I don't see why computer science must have a complete computational formalism for DNA and the cell before recognizing that it is doing a non-metaphorical forms of computation, some aspects of which have been shown work quite well for satisficing certain kinds of high dimensional optimization problems. That is what organisms adapting to their environments are doing.

The Standard Model in physics is also incomplete, the universe is not yet fully understood, and yet we don't avoid calling the work that CERN does "physics"--on the offchance it might be premature. =)

> if you're arguing that the computational limitations of DNA (in comparison to formal models that are universal) assist in restricting variation, then this would seem to be contradicted by a genome that is mostly non-functional,

The "computation limitations" of DNA, in the sense that it can generate a certain range of phenotypes, are embodied in its atomic structure, the basepair letters themselves, the structure of the ribosomes that process the codons into amino acids, the feedback loops of proteins that turn genes off and on, etc. etc. These constraints change over time, and having a phenotype space that is overly constrained or under-constrained would be one target for selection under some conditions. Unlike DNA, typically the "computational limitations" in most Genetic Algorithms will be set up before the system is run. Non-functional DNA may not participate in the function of a particular phenotype in a particular individual, but why is that a contradiction? This happens automatically as a part of the process of point mutation, crossover, duplication etc. If a gene is turned off, is that a contradiction as well?

> assuming that the evolutionary advantage of junk-DNA is increased variation conflicts with some known evolutionary phenomena. For instance, smaller genomed organisms show more evolution.

I don't think I would say it is "THE evolutionary advantage". It would be one factor among many, and various pressures affect it. Here is a computer science paper exploring the issue of genome size, in this case, how mutation rate affects it: Self-adaptation of Genome Size in Artificial Organisms http://www.springerlink.com/content/3cdd5p13ytve3pgd/

> it doesn't seem to be what Stimpson was arguing for or PZ is necessarily arguing against.

PZ declared that the genome is not a program. I think that it is definitely a kind of program--just not the kind that people are used to when they think of a typical computer "program". It is certainly not a "typical" imperitive computer program.

I gave the paper a quick read, so I may have missed it, but I'm wondering whether the restriction enzhymes that are so crucial to their model are endogenous.

DUH

of course they are NOT. They don't need to be. That is the whole point ! They can be stored ON the input DNA, and the cell builds them while processing the input DNA. That is EXACTLY how a cell manages to be turing equivalent . Sure, cells usually don't need to have those enzymes, but that is perfectly irrelevant because a device does not need to actually emulate a turing machine to be equivalejnt with it. All is needed is, that it is capable to do so, if it receives the right inputs

I'd appreciate it if you didn't use one part of my comment to attack somebody else's comment. If you wish to attack Peter, do so directly, thanks much.

Excuse me, I assumed we were having a discussion with several people involved. It looked like you endorsed Peter's comment so I was responding to both of you.

Peter did have a good point, in so much as models and analogies are often used without sufficient measure of their efficacy when looking at actual systems. the only real way to address that point is by looking at the literature itself, and see whether such analogies and models have in fact been productive or not.

As I said I don't think that's very relevant here, and I'm not sure that this is a good way to measure efficacy. Some analogies like 'harem' for mating systems are all over the literature but may be quite flawed as analogies. Some analogies like 'the genome is like a cake recipe' are not used in the original literature much but are great for popularization.

And then there's the question of whether we are talking about programs "just" as an analogy, but Paul W explained it better than I can in #244.

now, that said... did you want to actually look at how computational models of DNA function are used, and whether they have had value? that paper you cite seems to be a good place to start; there are lots of similar papers in that journal as well. I'm not sure I'd be good at evaluating their efficacy; I'm a behavioral ecologist. You'd have to turn us on to some literature reviews on the subject as well.

That would be very interesting but I'm not the best person to ask either :) It sounds like a topic for a whole discussion of its own.

Windy:

The evolution of cellular computing: nature's solution to a computational problem
Laura F. Landweber and Lila Kari
How do cells and nature 'compute'? They read and 'rewrite' DNA all the time, by processes that modify sequences at the DNA or RNA level. In 1994, Adleman's elegant solution to a seven-city directed Hamiltonian path problem using DNA launched the new field of DNA computing, which in a few years has grown to international scope. However, unknown to this field, two ciliated protozoans of the genus Oxytricha had solved a potentially harder problem using DNA several million years earlier. The solution to this problem, which occurs during the process of gene unscrambling, represents one of nature's ingenious solutions to the problem of the creation of genes. RNA editing, which can also be viewed as a computational process, offers a second algorithm for the construction of functional genes from encrypted pieces of the genome.
Biosystems
Volume 52, Issues 1-3, October 1999, Pages 3-13

Which does not comprise what I asked for. It speaks to computing, yes. But exactly what insights into the biology does it give us? We have known about rna editing for quite a while and the discovery of the origins of it are a biological problem and they were not discovered by analysing the computational elements of rna editing.

So I am at a complete loss to see how this paper advances the biology in any way. It offers no insights into the biology, it offers no predictions about the biology and we understand the biology no better by saying 'it is a bit like a program'.

Try again.

Paul W:

In certain respects I think that saying that something is a computer program is far *less* interesting than PZ or you thinks.

I don't think it is interesting, for the reasons I have set out.

Part of *my* point is that because many varied things do count as computer programs, saying that something *is* a computer program is saying much less than some people might think. Many things they assume about programs are unnecessary and insufficient conditions, so pointing out that something is a computer program CANNOT give you the kinds of precise and detailed predictions you seem to be looking for. (See T_U_T's comment #235)
It can only give you some hints, and some clues about possibly related work.

Then we are not in disagreement ;-)

So I am at a complete loss to see how this paper advances the biology in any way. It offers no insights into the biology, it offers no predictions about the biology and we understand the biology no better by saying 'it is a bit like a program'.

Sigh. Why don't you ask Laura Landweber why she finds the analogy useful. Can you give any examples, by the way, of analogies producing the kinds of predictions you are looking for?

Try again.

I don't have time to play "chase the goalpost" right now.

Thx for this interesting thread. As a lay person, I understand better the arguments that many have put forward to defend the DNA/computer program analogy.

Having said this, please remember, that when communicating with the interested lay public, an analogy such as this one may backfire, as most people tend to associate program with programer. We are then back to a similar discussion as the one on evolutionary design, that is explaining how the program has generated itself through natural evolutionary processes. "How do cells and nature 'compute'? They read and 'rewrite' DNA all the time, by processes that modify sequences at the DNA or RNA level."

Windy,

Sorry I didn't see your earliest reply to me, but I didn't check the thread that carefully.

My comment about complexity was more to do with the expectations of geneticists when the first genome sequences appeared, rather than a reflection of the current state of the art or a direct response to this debate. If you took many measures of what people understood to be complexity, say ploidy or overall genome size, then there was no good correlation between that property and measures of organismal complexity, say body plan or tissue diversity. Even gene number is often a poor reflection of those things. (The C-value enigma still remains, to some extent.)

Without a much greater synthesis of genomics with post-genomics, transcriptomics, and proteomics, it is very difficult to predict many organismal properties from the genome alone. Now part of that was to do with the fact that the code is oblique, but much of it is to do with the fact that epigenetic factors are also perhaps a greater source of information than was expected, and there are probably more emergent biological properties than we had imagined.

Anyway, as I say, I was simply giving an example of how genome sequences were somewhat overhyped as being the blueprints of life, and that there is much lacking in such an assumption.

Insofar as this relates to my comment about DNA being just an inert molecule. I recognise that software data can also be stored in inert mediums. My point was that there seems to me to be much more value in thinking of the entre information content of the cell as being like a program (in order to distinguish software from hardware, for instance), but that the commonly employed version of the analogy is very misleading - apparently, to many biologists as well as computer scientists, if this thread is any indicator.

Personally, I was always refering to Randy's original post. I'm more than willing to concede that specific variants of the analogy may hold value for people with the expertise to understand them, although I suspect that those people would hold such analogies much more as semantic tools for discussing their work and employing technical strategies; their technical understanding is probably sufficient that they can actually work with a model that is much closer to reality than much of our wordplay here.

I know that I employ many useful analogies in my own work which may have value in modelling molecular biological systems, but which can also be very misleading. I also believe that somewhere beyond analogy and models, I hold a mental image of the processes which comes closer to reality.

I'm simply cautious of introducing the analogy in such a simplified form, without giving warning that it can lead to gross misunderstandings.

"Having said this, please remember, that when communicating with the interested lay public, an analogy such as this one may backfire, as most people tend to associate program with programer. "

EXACTLY. That and many seem to think analogies are evidence. And when we toss in the Salem Hypothesis and the Dunning-Krueger effect, we get peole like Randy Stimpson, who insist that because they are programmers, they have some special insights into genetics and evolution.

I gave the paper a quick read, so I may have missed it, but I'm wondering whether the restriction enzhymes that are so crucial to their model are endogenous.

DUH
of course they are NOT. They don't need to be. That is the whole point ! They can be stored ON the input DNA, and the cell builds them while processing the input DNA. That is EXACTLY how a cell manages to be turing equivalent . Sure, cells usually don't need to have those enzymes, but that is perfectly irrelevant because a device does not need to actually emulate a turing machine to be equivalejnt with it. All is needed is, that it is capable to do so, if it receives the right inputs

Um restriction endonucleases are found only in certain bacteria. They are so called because they have evolved as a sort of immune system against viruses. They cleave specific bacteriophages and stop them reproducing, thus *restricting* their growth. I completely fail to see how using them in a technology supports the contention that dna is in any way normally a program, which is the contention under discussion.

I do not deny that in an artificial situation you can use dna to do computations. The point is that this speaks not at all to the normal functioning of dna and w learn nothing new about it, only about the range of substrates that can be used to do computations.

Sigh. Why don't you ask Laura Landweber why she finds the analogy useful. Can you give any examples, by the way, of analogies producing the kinds of predictions you are looking for?

Because I was not the one making the claim and so I actually have no interest in wondering about the motivations of people who write things like this beyond the obvious fact that apparently new areas always attract those more desperate to get on than do really good work. Such is life.

Never mind though I asked for some citations not with any hope or expectation of it being fulfilled. It seems I am not so far out of the loop as I might have been. Might have something to do with being a biologist who is married to someone with maths and compsci degrees, friends working in bioinformatics and an offspring doing a bionformatics degree and I was not aware of what I asked for existing.

It is however a bit depressing seeing how many people think that compsci questions speak to the biology. By all means get excited about dna computers, just don't try and tell me it is biology.

Bother, and I have been previewing everything else. The time I don't I screw up the HTML...

Since the post I made with a bunch of citations is caught up in moderation (I'm pretty sure it wasn't worthy of rejection) I'll just suggest some things that don't require URLs:

look at the link I posted in #196
look at the Santa Fe Institute
google for "avida"
maybe look at Karl Sims' work

the first is the most directly applicable to real biology, although much of the 2nd and 3rd are as well. The references I know are more from the CS side of things, so the first is probably most relevant to the "convince biologists that CS can be useful" goal, as the others are more "CS people decide to be cross-disciplinary and make crude models of biology," but I also object to looking with too much scorn on that: Peter in #212 complained earlier about some of the flock/swarm/herd work (Ian Stewart? I don't know that work, but I like Ian Couzin and Craig Reynolds in this area) as being a bad, oversimplified model, but I tend to think that it's worthwhile looking at simulations, even when they're oversimplified, and what they can tell us about the underlying patterns... L-systems and reaction-diffusion systems and the like don't model every aspect of what goes on in real organisms, but that they come close can frequently be useful in knowing what to look for in studying the real thing, and even in giving a reference point to ask "where does reality diverge from this model, and how can we use that as a roadmap to investigate the reality?" sorts of questions.

p.s. since the paper I linked to in #196 describes gene regulatory network components that implement AND, OR, and NOT, they are sufficient for defining a finite state machine, and therefore one could implement a Turing machine model using some chunk of DNA, transcription factors, and cis-regulatory modules as the Finite State Machine component, some other stretch of DNA (make it a separate chromosome if that makes you happy) as the tape, and the FSM states could even control differentiation gene batteries. That would be a whole lot closer to a Turing Machine interpretation of a living cell, if you really groove out on the Turing equivalence stuff.

Note that I think that trying to mash cellular mechanics into the Turing equivalence class is pointless, though, since I see the gene regulatory system as more analogous to logic that could implement all sorts of things, including Turing computation, but is doing something else. In that sense, I agree with PZ that the DNA is not a program for a Universal Turing Machine Computer, it's a program for something that could be in that equivalence class with a different program. Which, by the way, is often what computer scientists mean by saying it's in that class: just knowing that it has the pieces necessary that one could implement a Turing Machine puts it into a mathematical equivalence class that you can then do all sorts of interesting analysis on.

Is part of the terminology gap here that the "DNA is not a program" crowd thinks that the CS weenies are taking a naive view that there is one "tape head" that goes along and reads one instruction at a time, and jumps around when there's a "goto" and stuff like that? Because, from the descriptions floating around from the CS side, I think it's safe to say that none of the CS-side folks here have that naive a take on things-- I believe what most of us, and I'll even give Mr. Stimpson the benefit of the doubt, are saying is that.

In that regard, it's unfortunate that DNA/RNA/ribosomes have some similarity to Turing Machines, in that both involve some long tape-thing with some sort of discrete symbols on it that are read by some tape-head thingie, and sometimes modified. As it happens, that has relatively little to do with the equivalence-class of things that act like Turing Machines... in fact, most real computers are based on Von Neumann machines or lambda-calculus style machines, and Turing machines are just used by mathematicians to prove stuff about the whole class. And a typical computer science "thesis avoidance behavior" is to do things like prove that you can write macros in the vi editor to implement a universal Turing machine so that you can counter your officemate's arguments about emacs being superior...

An example of a way in which I'd expect this to be applicable is the following: the regulatory state of a stem cell is defined by some set of expressed transcription factors. In a certain environmental context, the influence of various signals, such as a signal from neighboring cells, can change its regulatory state to one that leads to it becoming a liver cell, through cascades of regulatory module activation leading to other transcription factors that lead the cell to transform itself, ultimately via expressing different proteins that liver cells need. In order to explain this, biologists draw big pictures with arrows showing the interdependencies between boxes representing these chemical regulatory networks and their gene activation states and stuff like that.

What field, you might ask, has studied the most complete mathematical formalisms for describing and proving things about graphs of boxes connected by arrows and how the system described by such a graph might act when it represents a system that changes state over time based on certain formal rules? I would argue that mathematical computer science and its allied fields, like computer engineering, information theory, and control engineering. Biology is moving into an area that has already been well-explored by other researchers. Admittedly, biology, like most fields, is full of weird, unexpected, quirky behavior, so the tools from computer science may need to be extended or modified to be appropriate, but the insistence that biology is completely different and that it's valueless and actively misleading to try to apply these tools strikes me as a position that is both close-minded and extremely hard to defend.

As someone who's primarily a computer scientist but is interested in, and actively working on, learning modern biology, I can say that my experience is frequently that biology has terminology and diagrams that are conceptually the same but notationally different than things I understand in other disciplines. Often, the reason that they're different is that the biologists have come at them from a different direction, not because one way is "the right way to represent it," but understanding the equivalences (and the subtle differences, if any) allows people to open up both sets of tools to attack problems... for example, it would be possible to take a described gene regulatory network that takes pages of boxes and arrows and turn it into a short logic expression. The boxes would still be useful, particularly for looking at gene expression patterns that might not be important for the logic, but having the ability to distill the network down to a simpler description could easily be helpful in getting a handle on the complexities of the control systems, and perhaps for better quantifying where to look for further systems that are missing, or for identifying what parts are more or less likely to be evolutionarily conserved (e.g. the ones that are sufficiently important to the system that most mutations are lethal.)

Mark the problem I have with the Istrail et al paper is that it is too simple. Few transcription factors bind the dna as monomers, most bind as dimers and many as heterodimers with variety in at least one of the partners. This is particular true of the E genes and factors that bind to E-boxes (CANNTG). This means you can get graded, effectively analog regulation of genes. Add to that that some promoters have multiples of binding sites and binding sites cluster into modules called enhancers and the schema of AND OR and NOT gates looks hopelessly crude for anything other than the simplest bacterial genes like the Lac operon which was figured out first because it is in fact relatively simple.

The paper does do a good job on reformating the biology in terms of the analogy. But biology it ain't I'm afraid. In order to speak to the biology it needs to deal at least with how we understand gene regulation now, not 40 years ago.

Why don't you ask Laura Landweber why she finds the analogy useful. Can you give any examples, by the way, of analogies producing the kinds of predictions you are looking for?

Because I was not the one making the claim and so I actually have no interest in wondering about the motivations of people who write things like this beyond the obvious fact that apparently new areas always attract those more desperate to get on than do really good work. Such is life.

Are you disparaging Laura Landweber here (surely it's not little old me you mean)? What's your evidence that this award-winning scientist is just "desperate to get on"??

Never mind though I asked for some citations not with any hope or expectation of it being fulfilled. It seems I am not so far out of the loop as I might have been. Might have something to do with being a biologist who is married to someone with maths and compsci degrees, friends working in bioinformatics and an offspring doing a bionformatics degree and I was not aware of what I asked for existing.

What a strange attitude for a scientist.

Thanks for the detailed response (#260), Peter. I should state from the outset that the class I'm sitting in on right now is beyond my background, so I may well have some serious gaps in my knowledge of some areas of genetic systems. In particular, I'll have to look up "E genes" to get the details of your counterexample... the wikipedia E-box page pretty much just says "it's a transcription factor," which is perhaps good enough.

What I *have* learned in the class in some (dizzying levels of) depth is that the particular system of (mostly) cis-regulatory elements in evo-devo is very amenable to being studied as a (weird and complicated) networked logic system. It is true that the presence or absence of a transcription factor is frequently more of an analog/continuous sort of thing, but in the systems we're looking at, which the prof assures us is representative of the systems he studies in sea urchin development as well as the embryogenesis of all the bileteria and probably cnidarians as well, there are a lot of mechanisms that actually correct for the ambiguities in the analog system, like little feedback loops that "lock in" regulatory states, or mutual-exclusion through repression and so forth.

We did spend a whole class discussing the analog version of this, too, as a counterpoint that understanding this is, indeed part of the whole system, and it was mentioned that rather than thinking of it as "boolean" it's sometimes helpful to think of these systems as "boolinear," but this doesn't invalidate the information processing approach or support "biology it ain't." It "ain't" a complete description of all biology, but if you have one of those, there are some Nobel committee folks you should let know.

The criticism that there is "partial occupancy" or similar, though, while it's important to keep in mind for an understanding of the underlying mechanisms, is far from a showstopper... in computers' guts, for example, there are lots of voltages that are somewhere between "zero" at 0V and "one" at +5V (or whatever)-- if the value is 4V there are simple circuits that "round" it to 5V, and if it's .2V, it gets rounded to 0V... and at least the cis-regulatory circuits commonly do roughly the same thing.

The multiple binding to cis-regulatory modules, also, is a feature, not a bug, so to speak: the combinatoric nature of several transcription factors contributing to one regulatory module allows both robustness and, more importantly, a combinatoric control of the regulatory that can use a limited combination of regulatory factors to control a much larger vocabulary of regulatory state. I seem to remember that it also, at least sometimes, moves things more in the non-linear discrete direction because it's energetically easier for the later transcription factor to occupy the last site when its neighbor sites in the regulatory module are already occupied by their factors, so if the regulatory requirements are partially met, there's a nonlinear "boost" that makes it far more probable that it'll go "above threshold."

I gotta go, or I'll make myself late for the aforementioned class, though, so I apologize if this is lame due to lack of proofreading...

For what it's worth, I don't think that graded transcription rates of genes keeps them from being rules in a production system. (And hence clearly a computer program.)

There are rule-based systems whose values include both booleans and scalars, where the scalars may be interpreted in various ways depending on what you're doing with them.

For example, diagnostic expert systems may use scalars that represent confidence ratings for hypotheses, and process control software may use scalars that represent degrees of truth (like "medium heavy", for something that's only kinda heavy for the purpose at hand).

In those kinds of systems, the scalars often don't need much precision or accuracy---like 1-to-10 confidence ratings, or a 5-point scale of whether an object is "heavy" relative to the power of a robot's arm---and that can be plenty good enough for the kind of computation being done. Fairly noisy analog values would work just fine, as long as they can hold a few bits worth of information.

This sort of analogness is one of those things that many people seem to think is "not like a computer program," but is in fact *very*much* like some computer programs. Computer scientist even go so far as to limit the precision of scalars to reflect the crudeness of the judgements, and sometimes even to introduce noise in order to test the robustness of the code. (Because in many cases, if the code depends on much accuracy OR precision, the logic is wrong.)

There were some mentions of DNA as the mass storage for the genome (noncarborundum, Monty, Bumner), but not as much as I expected.

Last time I checked parts of the DNA in the nucleus get splitted, copied to mRNA, tugged to one of thousands of ribosomes which transcribe the mRNA strings using amino acids and their own ribosomal rRNA in order to produce proteins. This part really resembles production systems more than a computer/processor/task (task meaning the program which is being executed as opposed to just passively stored).

A wholly different quality of complexity: the signalling and feedback mechanisms within one cell and between cells, partly triggered by the assembled proteins and triggering more operations, all of which taking place while surrounded by many sorts of other organic molecules dissolved in water. It's not ordered, it's a mess, everything wiggles and mixes, as PZ describes it in "Buffeted by the winds of chance: why a cell is like a casino".

So I think analogies break quite early and on several levels here. Despite that, like CJO I enjoy reading the conversation, especially when it makes me install POV-Ray after years and do a few renders after years; when awk output gets piped to sh; when the Turing-completeness of vi is mentioned, with a jab at emacs (editing Jpegs in a primitive text processor, Norman Doering? Just "vi -b" them); when the strictly stack-based PS reminds me of my Forth hacks; when there are so many interesting links to Malbolge and other esoteric subjects.

Thanks y'all for a great read!

Paul, Mark, et al: Thanks for continuing to make interesting contributions from the CS point of view despite much skepticism directed at them, your comments have been very instructive.

Bernard, thanks for your response,

Insofar as this relates to my comment about DNA being just an inert molecule. I recognise that software data can also be stored in inert mediums. My point was that there seems to me to be much more value in thinking of the entre information content of the cell as being like a program (in order to distinguish software from hardware, for instance)

True, but see the link in #221: what is the origin of a lot of that cellular information? :)

Last time I checked parts of the DNA in the nucleus get splitted, copied to mRNA, tugged to one of thousands of ribosomes which transcribe the mRNA strings using amino acids and their own ribosomal rRNA in order to produce proteins. This part really resembles production systems more than a computer/processor/task (task meaning the program which is being executed as opposed to just passively stored).

Umm, so it gets sent through BISON before being executed in Perl? lol

Again, not something that prevents DNA from being a program, since you *can* and some people often *do* use a program to parse data into being a new program, just like a production line, so that *that* program can do something in one of several steps needed to produce a final result. Its just not usually done unless you need to develop cross platform applications. In the case I saw, a file containing both processor specific code and constants which defined the size of integers, and the like, was fed through BISON, to produce a new file, which only contained the code "specific" to the machine you wanted your program to run on, which was then combined, in a further step, into yet another program. Mind you computers and languages are not really designed to do that, since something like Lua, Python or Java is "designed" to be identical across all machines it runs on, but I could see someone writing a "baseline" language, which could be fed files containing instructions, which would alter their behaviors and expression, based on "which" system happened to be transcribing them at that moment, and thus producing code that did what was expected on those machines. *Our* intent would be to have a single thread run identical processes across all machines, but there is no reason you need even that to be true, instead allowing multiple threads, different behaviors depending one environment, etc. Would be kind of interesting to see such a system in operation.

Marko,

Nothing in your post makes me think that "analogies break down early and often," from a CS point of view.

Similarly weird things happen in computers we use every day. For example, various data structures used by different parts of a program may be stored in pretty much randomly interleaved chunks in "memory," which is really an assemblage of several different media arrayed in strange ways in space. If you looked at the memory ordering, it'd just look like soup. And some of that that soup is actually stored at several levels in the memory hierarchy, with some data being in more than one place even with an single level in different representations, but when you follow a "pointer" to fetch a datum, it all works out so that you don't notice any of that. The data may be moved between several media, transformed between several representations in one medium, etc. The "address" of the datum is operated on in parts in parallel, with some parts being used to switch parallel hardware while other parts are still being transmitted. Several pieces of data may be fetched from one level of the memory hierarchy to another in parallel because the address hasn't been uniquely decoded yet, and the right one selected from those once the full decoding is available.

And all that's in a "von Neumann machine" like a PC. If you get into massively parallel architectures, it can get even weirder. The von Neumann machine abstraction is simple, but the actual implementation usually isn't.

Using chemical concentrations as variables, all diffused through a liquid soup, doesn't seem any stranger to me, or harder to see as "computing." It seems pretty simple.

(Not that there isn't complicated and hard-to-understand stuff going on in a cell; I just don't think using a bunch of chemicals dissolved in a liquid as variables is any weirder than things we do all the time.)

Keep in mind that every digital machine is analog at the bottom. The itsy-bitsy transistors and capacitors we use as "uniform" switches and flip-flops are actually noisy analog devices with random variations in sensitivity and capacitances. They work only because enough electrical particles per transistor that they are statistically reliable, and because we ensure that whatever's downstream can ignore the variations between them and extract a clear signal.

Using chemicals dissolved in a liquid is the same kind of thing---if you have enough liquid with enough molecules dissolved in it, you can use it in much the same way, as a statistically reliable variable with acceptable precision.

If you really understand what makes silicon computers work, at several levels (and across a few dimensions), a chemical computer seems pretty near-fetched.

Paul, you almost got me convinced. When I was typing "rRNA" I thought "gee, another program". But now we relate programs to DNA/RNA and variables to chemical concentrations. Give or take the fact that those variables have blurry scopes (local to an organelle, cell or the cell's proximity; global to the organism in the case of hormones); that there seems no biochemical analogy to the MMU which transparently maps the "soup" of memory pages to the linear memory model that a task sees; that Harvard/von-Neumann/Turing are ideal models which help just so far even in CS; give or take this and more -- yes, we can draw rough parallels. Good for starters, but maybe bad for experts.

Kagehi, of course you're right. Programs generate and modify other programs (i.e. treat code like if it's data) since the sixties. I was never a heavy lisp user, but that was one early language in which you could manipulate your tree of tokens as data -- which lead to mindboggingly powerful and hard-to-understand macro systems like CLOS.

There's an xkcd comic about [insert your favorite deity] being a lisp programmer, check out #224 and #312. Or is everything a Turing machine (#205)? If it wasn't for those parens, I'd be a lisper too: #297. Wish I had that kind of introspection in ruby to play with. But I digress.

Lex (GNU flex) and yacc (GNU bison) also generate programs, and it's not unusual for a higher-level-language compiler to compile/bootstrap/verify itself -- see Niklaus Wirth's T-diagrams, couldn't finy anything in Tremblay/Sorenson.

I'd be blind not to see rough analogies with what I've learned about biochemistry, but: is an analogy any good if it can lead us to false conclusions and dead ends -- or else (false dichotomy?) becomes more and more obsolet when you put decades of research work into it like PZ?

Marko,

In talking about the implementation of a von Neumann machine's memory, I didn't mean to imply that I think cells are von Neumann machines---they're very much not.

I was mostly trying to say that representing variables as chemical concentrations in a soup is a simple, reasonable way to build the working memory of a programmable (parallel, rule-based) computer. Compared to how we actually implement the memory of von Neumann machines, it's very simple and straightforward. (And just the kind of thing I'd expect evolution to hit on early on and build up from.)

There's no need for an MMU (memory management unit) in this case, if you're not trying to make a fast computer. (Most of the complexity in a von Neumann computer memory system comes from trying to make it fast.) As long as a rule being fired is transcribed often enough, and the molecules diffuse through the "working memory" fast enough, you're fine.

It is also not a problem if the rule-firing (transcription) machinery is also just some pieces of chemical machinery floating in the soup, and we rely on random molecular motion to statistically ensure that the outputs of one operation eventually get where they need to get. (By bouncing around in the soup until they run into a binding site.) That is a fine way to build a computer, if you don't mind it being a rule-based parallel computer with a slow cycle time and no synchronized clock.

The fact that it's a chemical soup with data and code and machinery scrambled together is irrelevant, as long as there's the right specificity of action. Instead of using wires to direct particular bits of info right to particular pieces of machinery that operate on them, we just diffuse them around and let binding sites do the selecting; the shapes of local areas of a molecule act as tags, and it gets sorted out by filtering at the receiving end. (Each binding site shape acts like a tuner, to "listen" to the right signals and ignore the others in the soup.)

That's slow and inefficient, but it's simple and it works. (Just the kind of thing you'd expect evolution to latch onto early on.)

There's no sophisticated scope rule in this system, just a membrane to keep the computational soup together in a small enough area that a finite number of molecules is sufficient to eventually promote or inhibit the right rule firings. If there re sophisticated scope rules, they're implemented at higher levels than this rule-firing "machine language," or by added on mechanisms that complicate the model.

(You wouldn't expect a sophisticated scope rule in a machine language, anyway. On von Neumann hardware, for example, the machine language distinguishes between registers, memory, and device I/O, and that's about it. The nice scope rules programmers see are implemented in software.)

To head off possible confusion, maybe I should say a bit more about locality of computing. (The following isn't mostloy in response to you, Marko.)

One thing some people say about how the genome isn't a computer program is that they expect a computer to be one centralized, tightly and hierarchically structured thing, rather than a network of cooperating gadgets.

In general, a computer isn't. It's exactly a network of cooperating gadgets. Any hierarchical operation emerges from how the gadgets talk to each other---which gadgets send which signals to which other gadgets to get them to do which things.

At the level of the rule system "machine code" inside the cell or nucleus, the genes are just a bag of rules, and hierarchical structure arises from which rule firings send chemical signals to make other rules fire.

Some people seem to think that the obviously programmatic nature of things like Hox gene expression "breaks down" when you look closer. My impression is that it doesn't. It's just that the high-level sequencing of major operations emerges from the interactions of low-level rules---and that's *exactly* what you'd expect if the machine language is a parallel rule system.

If you ever used a forward-chaining production system, you'll know what I mean... to get rules to fire in sequence rather than in parallel, you need to introduce data dependencies, so that the output of one rule serves to trigger the firing of the sequentially next rule, etc. Often you need to sequence major phases of the program, but below that level of granularity, many things can proceed in parallel, with feedback loops and whatnot. (Very much like major developmental programs and underlying genetic regulatory networks.)

That's one of the many ways that people think the computer "metaphor" is a bad "analogy" when in fact it appears to be a simple literal truth, not an analogy at all. People look at a parallel rule system doing exactly what parallel rule systems typically do, and see it as a "bad analogy" to a Pascal program.

Likewise, people may see the whole genome replicated and processed a bunch of communicating cells, with different stuff going on in different kinds of cells and higher-level structure emerging... and some think that's "not like a computer".

That is like a computer, just not like your PC. It's what computer scientists call an SPMD machine. That's a "single program, multiple data" parallel computer. It consists of a bunch of smaller computers, each of which has its own copy of the whole program but can execute it differently depending on local data and differences in local hardware. (Some may have graphics coprocessors, others may have disk drive ports, etc.)

Computer scientists have been building and programming machines like that for decades, and programming them in several different styles, and they're very definitely computers. They're just not von Neumann uniprocessors.

The cells of a multicellular organism seem to combine in much the same way. The local (sub-)computers appear to be basically rule-based rather than von Neumann, but the overall setup is about the same.

Just as in distributed SPMD programming, the large-scale computational structure emerges from the interaction of a bunch of little computers running the same program but operating on different data and accessing different hardware.

The problem, again, is not that the genome isn't a computer program. It's that most biologists don't understand basic computer taxonomy.

To say that such a setup isn't programmed computing, and that the "computer analogy" breaks down, is a bit odd.

It's like assuming that all "animals" are like dogs, and that the "animal analogy" breaks down when you look at cows.

I think the problem is really that the "computer analogy" isn't a particular analogy at all, at the level people want. It's a literal truth at a lower level than they expect.

A productive "analogy" at the level of detail some people want would require actually understanding computers. You can't make good analogies between computers and other computers if you don't understand computers in a fair bit of detail.

Maybe the right thing for some people to do is to proceed doing "real biology" and ignore computer science. Maybe it's not worth your time to understand what's already been figured out about some systems that are strikingly similar if you look close enough. Maybe it's okay to reinvent some wheels, and I can't honestly say that the branches of computer science most applicable to genetics are especially well-developed.

In general, though, it's hard for me to believe that's the best way to proceed.

If I'm right that the genome literally is a computer program, that means computer science isn't just a "handy tool" for "modeling" in biology. It means there's a branch of computer science, not yet well developed, that is the core subject matter of biology as well.

If I'm right that the genome literally is a computer program, that means computer science isn't just a "handy tool" for "modeling" in biology. It means there's a branch of computer science, not yet well developed, that is the core subject matter of biology as well.

In that case, though, you are left with the problem that there is little use in considering it from a computer science perspective, because direct biology (which may be doing the same thing, but with different terminology) is already one step ahead.

Just because you can construct a trivial argument that the genome is equivalent to a computer programme, doesn't mean that you can actually - as you concede - presently do anything useful with that information.

Note that it definitely isn't a computer programme in one very literal sense, because the term computer programme is very clearly defined as being a programme for a computers, and not a genome for an organism; it would be much better to create a completely new terminology which encompasses both genomes and computer programmes as different examples of the same class of things. I think that people will be much more receptive if you talk in terms of equivalency, rather than trying to co-opt systems biology into computer science.

Actually, come to think of it, simply calling it systems biology would probably be a good idea.

Anyway, none of this changes the fact that - even if you are right - the DNA/computer programme analogy is still, often, just an analogy; the vast majority of the time people are utilising preconceptions which are wholly incompatible with your view of the problem.

Maybe the right thing for some people to do is to proceed doing "real biology" and ignore computer science. Maybe it's not worth your time to understand what's already been figured out about some systems that are strikingly similar if you look close enough. Maybe it's okay to reinvent some wheels, and I can't honestly say that the branches of computer science most applicable to genetics are especially well-developed.

The problem with doing anything else other than "real biology" is surely that cells, or even many complex biochemical networks, are too complex to model in abstraction.

Does the technology exist - other than as cells - to utilise your insight? Is it possible to construct such a computer and interogate it in a way that lends additional insight to the problems of system biology? What I mean to say is; are you simply proposing the grand reinvention of the wheel, or rather, cell?

Is there any point in creating a new computer, if we already have the cell to study?

Or, are you simply thinking of something like systems biology?

Note that it definitely isn't a computer programme in one very literal sense, because the term computer programme is very clearly defined as being a programme for a computers, and not a genome for an organism;

No. Something can be both. That is exactly my point. What it's made out of and who or what "designed it" doesn't matter as to whether it counts as a computer. Program vs. genome seems to be a false dichotomy.

Some categories of things are defined by their functional organization, not their provenance.

It would be ridiculous to deny that a femur is a lever, just because it's part of an organism and made of cells. It's still a fact that among other things, a femur IS a lever. It would be pretty weird for biologists to deny that, and make up new terms for things like moments, stiffness, and leverage, rather than using the terminology and accumlated knowledge of physics and mechanical engineering.

Likewise, the heart is literally a pump with valves. Why make up new words for those things? Certainly the heart is a rather different kind of pump than any so far made by humans, and it's valves are different, too. Does that make it a "useless analogy" to say that the heart "pumps" blood?

Of course, just saying that the heart is a pump will not tell you under what circumstances you'll get turbulence that makes the valves malfunction. You need a taxonomy of pumps to know where to look for relevant stuff. Making up new biology-specific words for "pump" and "valve" would not help with that.

it would be much better to create a completely new terminology which encompasses both genomes and computer programmes as different examples of the same class of things.

My claim is that the genome is at least mostly a computer program. There's already a name for that, so inventing a new name for "computer program" would be obfuscatory. You might want to come up with a new term for what KIND of computer program (most of) the genome implements, and that would be fine. We already make distinctions like that---digital vs. hybrid vs. analog, von Neuman vs. rule machines, etc., and there's plenty of room for more.

Messing up the taxonomy to avoid calling a biological computer a computer is not the way to go. Parts of an organism are levers, but that doesn't mean biologists should coin a new super-category name that includes long-bone-levers and other levers. Calling a long bone a lever does not imply that it's not some other things, too.

I think that people will be much more receptive if you talk in terms of equivalency, rather than trying to co-opt systems biology into computer science.

At the moment, I'm not trying to wage an effective political campaign to get biologists to admit that the genome is (mostly) a computer program. I'm just arguing that it's true.

I find your use of "co-opt" a bit weird.

Consider physics in the early 20th century. Physicists need ed some new math to describe the phenomena they were working with, because important areas of mathematics were not well-developed, and they couldn't solve the equations they needed to solve. So some physicists extended those areas of math, coming up with new mathematics that got mathematicians interested. Then the mathematicians returned the favor, extending the math in ways that often turned out to be ever more useful to physicists.

That's the way it should work. Both sides profited hugely.

When physicists resorted to doing new kinds of math themselves, because those stupid mathematicians couldn't solve their equations, they didn't make up all new terms for everything and say that it was "physics, not mathematics."

They didn't make the mistake of telling mathematicians not to be interested in the kind of math they needed, because it wasn't really math. They didn't make the mistake of willfully obscuring the mathematical nature of what they were doing by describing it in other terms---they might invent a term like "tensor" as necessary, but they'd still call an integral an "integral."

If I'm right that the genome is primarily a computer program---or even if it's anywhere close---that's the kind of collaboration and synergy you need between computer science and biology.

You guys are just getting started reverse-engineering a really big computer architecture and program. By the time you're done, you're going to need a fairly advanced computer science to describe it well.

Right now, the computer science you need is admittedly mostly not well-developed. If you want any help in developing it, follow the example of physics + math.
I think in the long run it will be a big win to clarify the dependence of biology on computer science, so that the undeveloped areas of computer science get developed.

If you'd rather say, "screw you, we don't need no stinking computer scientists butting in," okay. Best of luck with that, but if *I* was trying to debug an undocumented zillion-line program on vaguely-understood architecture, I'd be looking for all the help I could get.

I do realize that it's impolitic to say that a mature biology will consist largely of computer science. Apparently, many biologists think there's something insulting or demeaning about that. Too bad. I think it's true.

Note that physicists don't feel the same way about mathematics. If a lot of their work is mathematical, they're not the least bit ashamed to say so. If they take an immature branch of mathematics and advance it beyond what the mathematicians have done, they're usually proud of it. That's one reason mathematicians often return the favor.

re #270,

it is exactly because biology is complex, interesting, and has a lot that is not yet understood that computer science may be able to help.

Instead of taking the attitude of "you stupid computer scientists cannot possibly have anything to contribute to our elite field," or making a blanket statement that "biology is already one step ahead," it would be possible to consider that some of the people who know more computer science than you appear to could be right in saying "gee, we're fascinated by what you biologists are working on, and we've noticed that we have some tools that could help you get a handle on some interesting problems in biology, because some of those problems are either similar or identical to ones we've studied in some detail. We'd also like to learn what techniques you've learned in biology that we might bring to computer science, and incorporate lessons from that."

I fail to see how declaring that computer science has nothing to offer biology is particularly less close-minded than a creationist declaring that "Darwinism" has nothing to offer biology. And similarly, way dismissing the "genome as computer program" model as "just an analogy, and one that is probably wrong and would add nothing useful even if it's right" sounds a whole lot like creationists that dismiss the theory of evolution as "just a theory, and one that is probably wrong and would add nothing useful even if it's right." And gee, many people are applying all the same logically flawed arguments creationists use: setting up an oversimplified strawman of "genome as program" and then showing why it's stupid, or saying that anything that the advocates of the model can't immediate explain with airtight certainty is a "gap" that contains some unspecified mystical aspect which makes the model irrelevant. We all agree those sorts of arguments are abhorrent when creationists use them, right? Sheesh.

I notice that very few, if any, people who have some formal theoretical computer science experience seem to accept the argument that it is irrelevant to this aspect of biology. This might be partially wounded pride, but I posit that part of it is that if we accepted the arguments, we would have to stop using it for computer science, too, because you've declared (believe it or not) that computer science isn't even applicable to studying COMPUTERS.

First I must say sorry for speaking of lisp in past tense, it still seems very much alive, e.g. as an embedded language (emacs, of course, but also AutoCAD etc).

Paul (you don't by any chance have the alias "websnarf" somewhere else?):

"People look at a parallel rule system doing exactly what parallel rule systems typically do, and see it as a "bad analogy" to a Pascal program."

I agree, but isn't the typical case the point of an analogy? Compare what is commonplace, familiar and typical, not what is exotic, extravagant and extreme?
Likewise, if one can make isolated DNA behave e.g. like a Chomsky grammar, how is that necessarily relevant for biochemical processes at the cellular level?

"You can't make good analogies between computers and other computers if you don't understand computers in a fair bit of detail."

Hmm, a good analogy should be graspable by laypeople, because experts don't need them. Is that too naive?
SPMD (or SIMD, I for instruction) -- early 90s' Transputer, anyone? We had one of those at the place I studied CS. There were demo apps simulating neural nets. Again, I very much see the rough resemblance to neural cells, but it doesn't convince me in the details; example: simulation of the influence of drugs on neurons. (BTW I see you just posted again, but it's midnight here in Germany, and I'll better take a nap.

Bernard: I also like "systems biology" as a phrase. About the "programme for a computer"; I'd even go along with relabeling the proposed role of DNA as "algorithms" or "recipes" or "design patterns" in order to abstract away from programs. Does that give us new insights? Maybe not now, but at a later time. Like you, I'm not at all sure if our strained analogies are helpful in any way.

In that case, though, you are left with the problem that there is little use in considering it from a computer science perspective, because direct biology (which may be doing the same thing, but with different terminology) is already one step ahead.

What I mean to say is; are you simply proposing the grand reinvention of the wheel, or rather, cell? Is there any point in creating a new computer, if we already have the cell to study?

What's the use of creating nanomachines and neural networks, when biology is one step ahead? To get to the underlying principles, I think.

marko:

Hmm, a good analogy should be graspable by laypeople, because experts don't need them. Is that too naive?

I think it's the other way around - we shouldn't popularize genetics using a naive analogy to personal computer programs, but they can be useful to the experts. There seem to be quite a few people in this thread asserting (my impression; sorry if this summary seems unfair) that if we can't see the flaws of the program analogy, it's because we lack the sophisticated understanding of the biology PZ has! But on what basis do you argue that the authors of the article in #242 have a less sophisticated understanding? (And all respect to PZ, but he doesn't actually work in genomics directly)

This is still going? lol

Seriously though, sometimes being "in" a field is a real good way to be blind to the obvious, and having "some" knowledge, but not enough breadth of scope, of another just means you have a far from clear perspective of the real intersects between them. This can lead to failing to see intersects that exist and, in the long run, going down even bigger dead ends than you otherwise would have. Case in point, most people running high end physics and astronomy experiments have "some" experience in CS, since they have to write their own programs. However, CS experts went to some of them a while back and where horrified to discover that 90% of their projects failed due to unfixable bugs, not the invalidity of the experiments, and that the physics and astronomy people where using text editors and CLI compilers to "test" their code. None of them, at the places they went, had ever used, seen or even **heard** of modern GUI editors, integrated debugging, or anything other than C (and I don't mean C++, so they also had no clue how to use reusable objects, object oriented programming, or modern event driven systems). They where literally *blind* to what was possible, with the result that billions of dollars where being flushed down the toilet every year, as a result of failed designs, bad debugging and inefficient code (the later when it actually worked at all). I see the potential for the same problem here. Biologists *may* in some cases know a fair amount about programming, but they may have never encountered any situation that is similar to what they work with, don't grasp the full concept of what *is* computation and thus can't step back and look at the stuff they are doing with a new perspective.

People made **huge** progress on understanding the human body and a lot of its related systems, while having "no idea" how 90% of its parts connected to each other and interacted. A lot of people came up with partly correct ideas about a lot of it, and refined them. Some came up with shear nonsense, since there wasn't anything else to work from. Without the realization that *all of them* where missing something critical in understanding what they where looking at though, we wouldn't have modern medicine. I can't help but think, with all respect to PZ and others here, that we are seeing the same hand waving you always get from situations like this, where some people deny the utility of examining things from a certain perspective one the rather odd perspective that, and this is nuts, they are not aware of people actually "looking at" DNA as a computer program. Well, either you need to show that people have, and haven't gotten any where, or they have and they have not. Complaining that there isn't any evidence one way or the other suggest **not** a lack of utility in the approach, but rather a gaping blind spot and/or a huge chasm of possible basic refusal to try and disinterested irritation at the idea that someone in CS might have an insight. I think we can agree that, if your only arguments amount to, "Go away, your bothering me.", and, "There ain't no data on if its useful or not!", there is a damn serious problem. ;)

Paul (you don't by any chance have the alias "websnarf" somewhere else?):

No, that's not me.

People look at a parallel rule system doing exactly what parallel rule systems typically do, and see it as a "bad analogy" to a Pascal program.

I agree, but isn't the typical case the point of an analogy? Compare what is commonplace, familiar and typical, not what is exotic, extravagant and extreme?

I'm not making an analogy for laypeople. I'm not making an analogy at all.

I'm stating a hypothesis---that the genome is literally a computer program---for (relative) "experts."

There is no way I can state that hypothesis without using the word "computer" in a potentially misleading way. If I don't use the term "computer program" I can't state that the genome literally is one.

The fact that people have preconceptions about computers that are false is certainly an obstacle, but let me make an actual analogy that I think shows where the fault lies.

Suppose it wasn't widely known that whales were mammals, and I stated that "whales are mammals."

If I didn't have enough credibility with my audience, they might just dismiss me as a crank. They might say things like

"We know mammals---those four-legged hairy land-dwelling things---and whales aren't mammals. Go away."

or maybe

"I can see that whales are sorta interestingly like mammals in some ways, but saying that whales ARE mammals is way too strong and people won't be receptive to it. Why don't you invent a new word for a more general category that includes both mammals and whales?"

My answer to the latter would be that there's already a word for a category that includes mammals and whales.

It's "mammals."

Similarly, there's already a word for computers and biological computer-like things.

It's "computers."

If my hypothesis right, that's the right word to use, and if people don't think it "sounds right," they should learn more about computers.

Given that I'm mostly trying to talk to scientists, it's important to me that I get the science right, and not be sloppy and waffly.

If nothing else, I want to know if I'm wrong. If somebody can show me I'm wrong with a valid argument that doesn't depend on misconceptions about computers, I want that to happen as soon as possible.

When it comes to actual laypeople, though, consider the fact that most people don't know how a von Neumann computer works anyway, have never programmed one, and have no idea at all how to build one. A rule system will make more intuitive sense to them. I can explain a simple rule-based chemical computer more easily than a von Neumann computer. It's a simpler abstraction and easier to implement, and for that level of explanation, it just doesn't matter that it's harder to program.

For anybody who doesn't see why it bugs me that people explain biology by a negative contrast to a false stereotype of computers...

Imagine the that the situation was reversed, and computer weenies explained computer things by a false contrast to biology, like

"A robot isn't an organism, or really very much like an organism. It has a whole bunch of exquisitely coordinated interacting machinery, and all the interesting work it does emerges from the interactions of those parts. There's no life force that magically animates it. You can only understand its high-level life-like properties as emergent properties of subtle interactions of many tiny mechanisms."

Now suppose that computer scientist wrote off biologists' objections by saying "It's okay. That's most laypeople idea of biology, so the disanalogy works great for teaching, even if it's not exactly true. Vitalism isn't dead among laypeople, so it makes sense to explain things that way."

What would you think that revealed about computer scientists' attitudes?

(If that analogy doesn't work for you, imagine Buddhists explaining themselves to Christians by false contrast to atheists. :-) )

Similarly, there's already a word for computers and biological computer-like things.

It's "computers."

But, if you're right, then there are also competing and more generally understood terms in use by biologists (when it comes to biology). I was talking about co-opting via terminology, rather than creating some sort of schism.

That is why I'm suggesting that redefining the terms is useful; not because it changes the underlying problems, but because it creates a philosophically neutral language for thinking about things, and provides a bridge between the fields.

You might want to come up with a new term for what KIND of computer program (most of) the genome implements, and that would be fine.

I think it would be essential, not least of all, to avoid the kind of confusion prevalent in this thread.

That's the way it should work. Both sides profited hugely.

Absolutely.

Perhaps you are too used to seeing rhetorical questions employed to dismiss ideas and opinions? (Mine was not rhetorical.)

All I want to know is whether there are insightful applications resulting from this equivalence. What I was asking was if it is possible to create artificial systems which mimic biological networks, and which have a comparable complexity to biological networks, and whether such tings would have advantages over simply direct study of those networks. It was a question, honestly posed.

Actually, rather than a defence of the general approach (which I think you've already argued strongly enough) I wanted you to tell me whether or not this would require modelling of the biological computer using a man-made machine, and if that is the case, whether it seems feasible to do so.

That idea may be terribly naive on my part because, of course, by your reckoning you already have the biological computer, and it might simply be a matter of finding ways to interact with it which employ your approach.

How might you anticipate this being useful? Is this kind of approach going to be better suited to studying complex, but isolated networks, or will it be powerful enough to tackle a synthesis of systems biology?

Knowing that something is possible, doesn't make it practically feasible.

Instead of taking the attitude of "you stupid computer scientists cannot possibly have anything to contribute to our elite field," or making a blanket statement that "biology is already one step ahead,"...

Who said either of those things? Please don't respond with petty comparisons to creationists, just because you feel that people are missing a point which is obvious to you (it might be that you're reading something in which isn't there) - it is insulting and unnecessary.

I'm well aware of the great advances which have been made in biology via cross-disciplinary research. I've worked in molecular biology and genetics, so I owe a debt to mathematicians, physicists, and certainly, bioinformaticians and computer scientists. I'm not scared of embracing new directions in research.

...declaring that computer science has nothing to offer biology...

Anybody who said such a thing is clearly ignorant, or else an idiot.

"A robot isn't an organism, or really very much like an organism. It has a whole bunch of exquisitely coordinated interacting machinery, and all the interesting work it does emerges from the interactions of those parts. There's no life force that magically animates it. You can only understand its high-level life-like properties as emergent properties of subtle interactions of many tiny mechanisms."

Now suppose that computer scientist wrote off biologists' objectionsby saying, "It's okay. That's most laypeople idea of biology, so the disanalogy works great for teaching, even if it's not exactly true. Vitalism isn't dead among laypeople, so it makes sense to explain things that way."

Yeah, I would be rather annoyed at such an argument too. Mostly because the first statement in the original idea invalidates the entire argument in *claiming* that somehow an organism can't be described in exactly the same manner. Such a statement, followed by the rest of that description, falls, if anything, even more completely flat (or completely delusional) than all but the most absurd claims already made about the difference/similarities between DNA and program code.

Who said either of those things? Please don't respond with petty comparisons to creationists, just because you feel that people are missing a point which is obvious to you (it might be that you're reading something in which isn't there) - it is insulting and unnecessary.

Um, you said:

In that case, though, you are left with the problem that there is little use in considering it from a computer science perspective, because direct biology (which may be doing the same thing, but with different terminology) is already one step ahead.

so perhaps I'd have been a bit more accurate to say "...biology...is already one step ahead." but I don't think I misinterpreted the meaning. I'll admit the "you stupid computer scientists" part is more inference than an exact quote of anyone, and I didn't mean to say that you were explicitly saying that... perhaps this is one of those situations where it's hard to infer someone's tone on the internet correctly. I saw many of your choices of phrase, and certainly a lot of the tone in the huge log of comments and the original post, as very dismissive of anyone who would try to use a computer science model to study biology. It sounds like you meant "what benefit would I get from doing anything other than experimenting on an actual cell" as more of an honest question than a skeptical, dismissive one, so I apologize if I overreacted to that, but I picked up that tone a lot in the comments in general, so I was perhaps overly sensitive.

My comparison with creationists wasn't meant to be petty at all, more concerned. I would really like to move my career path into integrating computer science and biology, and I have been disturbed by some of the comments in this thread seeming very oppositional to that idea. (Admittedly, I'm not yet convinced that I can make a living at this, but I am a firm believer that it is a good idea.)

Trying to open-mindedly answer the question, though, I think there are a number of ways that computer science can help biology that are in the domain of modeling the genome as a program. The two I can think of that have examples I can point at are as models for gene regulatory networks during development and taking an information-theoretic approach to modeling evolution.

In the paper I mentioned in post #196, and in related work, Eric Davidson and friends are modeling the GRNs that control sea urchin embryonic development as what is, at least by my definition a computer program running sort of a biochemical computer. It's not a "Turing" type computer, it's a special-purpose one, although it has the components needed to artificially make a Universal Turing Machine, but that's really an interesting side note. It was studied in vivo by laborious mucking about with the sea urchin genome and knocking out various genes or tagging genes to show expression of things in the developing embryo, and so forth. This experimentation led to an interpretation that a large component of the embryonic development is controlled by a cis-regulatory system that, while implemented as a system involving transcription and chemistry and proteins and the like, really involves digital logic and feedback and feedforward networks that express discrete regulatory states. These follow rules that are well within the mathematical objects used to study certain computer systems, for example finite state machines or discrete neural networks (a case where computer science is perfectly happy adopting biology terms, although real neurobiologists might scoff at the very primitive approximations of neurons as "sum and threshold" nodes.) This should not be news to biologists, since these are digital systems that control discrete things in biology. Except when things are broken, there are a lot of cases where the "analog, casino" systems lead to very well-defined "one or the other" phenotypes: in Ed Lewis' famous 4-winged flies, blocking the Ubx HOX gene results in an extra pair of wings. Not blocking Ubx makes a wild-type. You can't "half-block" Ubx and get a 3-winged fly, or a stunted wing pair... if you made a mutant that had a weakened Ubx, most likely (I could be wrong on the details) you'd get a genotype where the adult phenotype randomly got either 2 or 4 wings with some probability. Similarly, although some cells like embryonic stem cells are pluripotent most cells are locked into a particular fate. At some developmental state, a cell is mesoderm or ectoderm, and can't be some weird half-and-half one. Many regulatory genes do this sort of thing, which is very well-suited to being studied with the graph-theory tools from mathematics that computer science has incorporated and extended to address an assortment of real and theoretical models of computation. Some of those even handle uncertainty and non-determinism, such as an extension of finite state machines to something I've forgotten the name of but it's something like "probabilistic state machines." Since computers are often real-world devices, too, computer scientists have ways to model external factors that would be fine for modeling external influences on developing cells, such as delta-notch signaling. So essentially, there is a mathematical model, which has years of analysis tools, mathematical theorems, and so forth, which is pre-made for a way the developmental gene regulatory system works. It's possible that there will be some slight mismatches in some weird way the biological systems work that will require some modifications to apply it, but I'm not aware of any, and even if there are, this is an existing model that is either almost perfectly or exactly applicable to this stuff.

Another application of this on a completely different level is the simulations that people have done such as Avida (Adami, Ofria, et al) and its predecessor Tierra (Ray), which use a much simplified model of a genome as an actual computer program with the job of making a copy of itself. It's a very primitive, simplistic view of a genome, which is not intended to be complete or representative of a general system. What it provides is a testbed for evolution. Instead of culturing cells for hours, or fruit flies for days, or frogs/fish/mice for weeks or months, it is possible to simulate thousands or millions of generations of these "creatures" in an hour. One can say "it looks like insertion/deletion mutations cause this different effect from point mutations based on comparing these two species that diverged about X million years ago" and actually run a simplified simulation of that with a day's work and see if you get a similar result. Of course, there is always work involved in testing whether the model is an accurate representation of reality, but that's true of any scientific model or theory, and we have lots of tools for that. Also, the Avida folks have used a lot of statistical tools to look at the information content in their simplified "genome" and how various types of mutations impact the information content of both the simulated genome and the real genome of actual organisms, and can compare them, and so some analysis to make predictions that can then be tested in vitro, "in silico," or both. They can help validate the molecular techniques being used in palaeontology investigations, to validate models of mutation rates, or to evaluate the consequences of certain types of mutations that can then be compared to the fossil record. And that can lead to more interesting investigations of modern species to see if the combination of simulation and understanding of the modern genome can be matched to the fossil records' phenotypes and imply a genotype history.

I'm sure there are more examples, those just happen to be two that I'm moderately familiar with.

Anyone for a mathematical proof that DNA necessarily results in evolution? (It does occur that we should already have this, but I couldn't find any reference)

One important aspect of the Flocking simulation wasn't that it provided an explanation of how flocks work- but that it proved that it was possible for it to be far simpler than some people were insisting it had to be. Perhaps that's trivial or uninteresting for the serious biologist - but it has lots of value in helping to educate the layman that birds flocking doesn't require some magical avian psychic link.

If it is true that the Genome is a special type of computer program, it's not the predictions that can be made now that will be really fun - it will be the stuff no-one expects until someone says "Hmm, now that's a bit odd"

Anyone for a mathematical proof that DNA necessarily results in evolution? (It does occur that we should already have this, but I couldn't find any reference)

A mathematical proof is not necessary. As long as we have a carrier of heritable variation in a population (and we know empirically that DNA is such), and differential survival/reproduction among the variants, evolution follows.

#282

Necessary, no. But perhaps interesting if it can be transposed to other contexts. I thinking about systems (not necessarily biological) where the 'DNA' is less conveniently identifiable.

If I took an embryo and replaced the mouse genome with an elephant genome would I get something significantly different from a mouse?

I think PZ's idea of programming is a bit simplistic.

He argues that if anything does the computation, it's the cytoplasm. Nobody is talking about the computation, that isn't what a program does. He says "A computer program contains the operators that work on data from the environment." -- Nope. That operation is carried out by the environment (processor) after it's compiled and done in memory.

If a program says: "int i = 9 * q;" the program is compiled into assembly code which is loaded into memory and executed by the processor on a virtual copy which is absolutely nothing like the original.

He then suggests that computer programs aren't largely dependent on the environment. If your compiler treats a ^ as a bitwise xor that's significantly different than treating it as a power symbol. Also, when loaded up in memory the program itself is treated as another thing manipulated by the execution. In fact, a lot of times the variable buffer overflows can be used to rewrite the program thereby making it execute code it shouldn't (common in hacking) and used by viruses... and in the biological sense by viruses!

"Another thing to consider: DNA came last, protein/RNA/metabolic intermediates came first. The predecessors to the progenote did all of the things we characterize as the product of a program, without a genome."

And people wrote in assembly before we had compilers. You're not overly impressive here.

Two comments:

#1
Although the fellow's naive interpretation of programming and the genome are obviously wrong, there are very strong and important connections between biology/genomics and computation/programming, in fact, there is a fast growing field called bioinformatics that studies exactly this. For example, a central control mechanism is not necessary to programming. The objections PZ makes can be modeled on a non deterministic turing machine or by using multiple threads(in a non-theoretical system). I always think of the genome as parameterizing and instantiating a number of objects, basically commands in a high level highly parallel language that is provided by the cellular environment in which the dna resides.

Moreover despite the simplicity of this guy's information analysis, that sort of Shannon-esque analysis is a very important and useful tool in bioinformatics that can tell us quite a bit and is in no way contrary to the principals of biology or evolution.

#2 I read a post above about the "spaghetti sort" algorithm, which I think is misleading. First, the spaghetti sort as described above is not O(n) it has a hidden cost of being able to do parallel search using vision in a single operation. This is not actually constant time at the limit, with enough items you'll need to do a linear scan with your eyes. It also relies on he fact that you'll never have a piece of spaghetti that is longer than your wall, which is not a fair assumption.

The author's claim that this sort of algorithm cannot be implemented in computers is incorrect. Spaghetti sort is most analogous to a non-comparison sort like the radix sort(O(n*k/s)) or the pigeonhole sort(O(n*2^k)). These are algorithms that rely on the fact that no matter how many items you have, the numeric keys of the items will always fall within some fixed range(between 0 and 2^k-1). These are like fixed wall length algorithms(in the parlance of the spaghetti sort). Due to a little bit of cleverness they can eliminate the spaghetti sort's O(n) visual search.

Comparison sorts, which are the most generally applicable type of sorting algorithm, are in fact bounded at O(n*log(n)). All faster algorithms must rely on some special constraint.

The major point of my post, however, is that this analog vs digital discussion is inane, digital systems and techniques are perfectly applicable to biological 'analog' systems. Thankfully, they generally do not require precision on the order of planck's distance to have utility.

Unfortunately I haven't checked in on this interesting thread. I see that most comments is concentrating on the individual organisms genome.

That is probably a mistake, as the whole population is what evolves and implements the algorithm of the main process. The individual then follows a development program that is subsidiary to evolutions learning of the environment and the current organisms functional potential.

This is how the genome is more like a resulting database of recipes than a coherent program. That the cell machinery and its digital chemistry involves algorithmic constructions is IMHO incidental on the fact that evolution hereditary mechanisms must be discrete. There is no deep mystery here and so no deep knowledge.

Bernard (#278),

I'm still not clear on what you mean by co-opting terminology. Basic CS terminology generally strives to abstract away from the underlying hardware details as much as possible. The relevant terminology and concepts are generally as close to substrate-neutral as possible already, so there's no question of "co-opting" as I understand it.

For example, you can represent any set of simple binary relationships with a "directed graph" (intuitively, a diagram with boxes and arrows) that says which entities have outputs that affect which other entities. A DG can be used to represent an electrical circuit, a hydraulic control system for an industrial machine, actual chemical processes and flows in a chemical plant, or expression nesting in linguistics or math. Using a directed graph to represent genes and their interactions isn't co-opting anything; the formalism was designed to represent anything that has certain formal properties, which is why it is used by control theorists, linguists, and especially mathematicans known as "graph theorists."

Likewise, the possible evolution of states of such a system can often be represented as another "graph" (diagram) typically using circles and arrows, where each circle represents an entire state of the system (or subsystem) under study, and an arrow represents a transition from one whole state to another. If the transitions follow deterministically from just the prior state and the current inputs, it's a "deterministic finite automaton," and if not, it's a "nondeterministic finite automaton."

This is exactly the kind of thing you seem to want---"a philosophically neutral language for thinking about things, and provides a bridge between the fields."

This kind of analytical framework can be useful far beyond what it would seem at first glance. For example a system that is in fact deterministic can often be usefully modeled as a simpler but nondeterministic finite automaton. The NDFA does not predict exactly what will happen, but does rule a lot of things out. That can be crucial for theory refinement.

The formalism can also be elaborated in various ways. For example, if the system in question can actually do fairly general recursive computation, that can't be captured with a finite automaton. A finite automaton may capture the basic action cycle/modes of a system, and in fact FA's are often used to represent the operation of recursive parsers, with the addition of a few other bits of machinery, such as a stack.

An NFA that incompletely describes a deterministic process may also be elaborated with additional information flows and state-transition constraints, so that you get an "augmented transition network" that more precisely describes the states and transitions.

(If you combine augmentation and recursion, you can get something called an "augmented recursive transition network" or ARTN. ARTN's are vastly more powerful than finite automata, but you can often leverage the graph theory and automata theory applicable to FA's to model much more sophisticated systems.)

I find it very difficult to imagine that this sort of modeling won't eventually be crucial to understanding the genome.

For example, just putting genetics in these terms raises some interesting questions. One is whether a cell is essentially a nondeterministic finite automaton, consisting of an assemblage of smaller nondeterministic finite automata. Another is where the nondeterminism is, and where it "goes away." (Many processes that are nondeterministic at small scales due to unsynchronized parallelism are deterministic, or more deterministic, at larger scales. Many state-transition pathways may produce the same result, or results that are equivalent at higher levels. I suspect there's a lot of that going on in the genome.)

Another question is where the recursion is. Is a cell mostly a powerful recursive computer, with something like stacks to save state, or is it a machine with a finite (but likely very large) state-space?

It may well be that the cell is a finite-state machine, but there is pretty clearly recursion going on somewhere. (For example, in the fractal-like development of blood vessels, air pathways in lungs, and some neural nets.) It may well be that the only recursion is implemented by distributed algorithms, which have no local stack to save state while doing sub-computations. Perhaps the only "stacks" are implemented by distributed SPMD algorithms, which use the states of many cells to implement recursion, by physically distributing intermediate states, rather than storing them locally. If so, that's very interesting. (The Turing-completeness of DNA mechanisms is largely a red herring for the software engineering actually done by evolution.) If not, that's very interesting, too. (Maybe conditional gene expression & splicing are being used to implement a local stack.) Either way, an answer would be a big step toward understanding what kind of computer program the genome is.

As for how to study these things, I think that both in vivo studies and "computer" simulations will have to play major roles. They both have their strengths and weaknesses. In reverse engineering a finite state machine of any size and sophistication, it's generally not feasible to simulate every possible state and set of inputs, to see what state it transitions to. You need computer simulations, where you can idealize the system a bit and test a high-level theory---by either brute force simulation of zillions of states and transitions, or by reasoning about the system to show that many of those things don't need to be simulated at all. (For example, it's often possible to prove that certain things are independent, so that you don't need to simulate all possible combinations of states of subsystems. That can give you an exponential reduction in the stuff that you have to simulate by brute force.)

In general terms, I think studying the genome has to proceed like any other difficult piece of science, with an interplay of empirical and theoretical results at several levels. Sometimes a formal/computer model will be oversimplified and give you spurious results, but it will often guide further theorizing and experimentation. (For example, if your computer model is nondeterministic in respects that the actual phenomenon is not, you should look for new constraints on the transitions between states.)

It seems to me that many geneticists are already doing more or less this sort of thing in a general way, and that it's working. Noticing that some genetic regulatory networks act as boolean switching networks is a good start. The applicability of computer science doesn't end there, though; it's just a good start. Once the networks are clearer, more and more computer science is likely to apply.

Can I prove that to your satisfaction, or to Peter's? Likely not Peter's. :-) I just think all of the indicators are good.

Near as I can tell, nobody who actually understands the basic issues I've been discussing has anything resembling a better theory. Maybe there's something going on with genes that isn't best described as computational, but I have no idea what that could be, and I'm very skeptical that anybody else actually does. (Excepting vitalists. :-) ) In particular, there doesn't seem to be anything going on that wouldn't be computable by some computer built in the general way I've outlined in previous comments, and such a thing seems as evolvable as anything else.

Torbjorn,

I'm not clear on how you think focusing on the individual organism's genome is a mistake, when comparing it to a computer program.

Consider the fact that typical large programs such as operating systems and browsers show considerable individual variation, such as which plugins and drivers are installed, various crucial data that affects how they behave, etc.

In general, sophisticated computer programs are intensely dependent on the structure of their environments---the underlying hardware and the computational environment within which they function. (File formats and various structures within the data they operate on, protocols and assumptions underlying interacting programs across the internet, and regularities in user input.)

There doesn't seem to be a fundamental qualitative difference there. In both cases it's important sometimes to focus on what goes on within the individual, sometimes on how the individual interacts with its environment, and sometimes on how populations interact with each other.

In both cases you have sophisticated and crucial interactions with the environment, which may include self-modification of both plain state and the local "genome" in response to environmental input.

(And contra to what PZ seems to imply, real programs are generally littered with numerous repeats of stereotypical riffs of code, some of which can trigger self-modification in one way or another. Some unusual OS's and language implementations even use that as their basic mode of operation, not just loading drivers and plugins as needed, and bootstrapping processes on demand, but optimizing general code into special-case machine code for the cases actually encountered, and re-doing it in the face of novel cases. I don't understand LINEs and SINEs well enough to make a detailed analogy, but those general kinds of phenomena are not at all unfamiliar or un-computery.)

LINEs and SINEs aren't just littered around, they copy themselves and insert themselves at random locations in the genome. I don't think any computer program has such parasitic elements...

It may well be that the cell is a finite-state machine, but there is pretty clearly recursion going on somewhere. (For example, in the fractal-like development of blood vessels, air pathways in lungs, and some neural nets.)

At the level of the whole organism, too. The genome (as opposed to cytoplasm) may not provide most of the specs for the organism if you only consider a single "run" of development, but over multiple generations, it sort of does, like the genome transplantation experiment shows.

Paul,

I mean in the sense of understanding its evolution, which is the main process that the genome participates in.

Cell machinery can certainly be modeled in algorithmic fashion, following the "database of recipes" alluded to in my previous comment. And considering its chemical nature it is mostly discrete which makes it tempting to name it "computer". But regulatory networks can also be called "regulatory networks".

I maintain that the genome as a learning machine follows an evolutionary algorithm over populations and generations, and that it as a functional machine follows a mishmash of other algorithms in the individual. As a research strategy I'm sure that a "computer paradigm" can unravel a lot of interesting relations among the algorithms, especially as it is a rather general approach as opposed to, say regulatory networks.

not in any way trying to agree with this fellow...but there are dna computer programs.
http://en.wikipedia.org/wiki/Dna_computer
to be exact its a DNA computer...(wonder if they would get better results with RNA..still pretty good results 330 teraops for 2002 that was pretty good..)

It's interesting to note that PZ assumes that a genome is like a computer program when he argues that repetitive DNA sequences imply junk. In a follow up blog entry I explain why software executables contain a massive amount of compiler generated repetitive sequences which are comparable to LINES and SINES.

PZ said:

I'd also like to see his [Stimpson's] software development analogy to these [LINES and SINES].

HERE IT IS.