The genome is not a computer program
Category: Creationism • Development • Evolution • Genetics • Science
Posted on: February 24, 2008 11:22 AM, by PZ Myers
The author of All-Too-Common Dissent has found a bizarre creationist on the web; this fellow, Randy Stimpson, isn't at all unusual, but he does represent well some common characteristics of creationists in general: arrogance, ignorance, and projection. He writes software, so he thinks we have to interpret the genome as a big program; he knows nothing about biology; and he thinks his expertise in an unrelated field means he knows better than biologists. And he freely admits it!
I am not a geneticist or a molecular biologist. In fact, I only know slightly more about DNA than the average college educated person. However, as a software developer I have a vague idea of how many bytes of code is needed to make complex software programs. And to think that something as complicated as a human being is encoded in only 3 billion base pairs of DNA is astounding.
Wow. I know nothing about engine repair, but if I strolled down to the local garage and tried to tell the mechanics that a car was just like a zebrafish, and you need to throw a few brine shrimp in the gas tank now and then, I don't think I would be well-received. Creationists, however, feel no compunction about expressing comparable inanities.
I actually have some background as a software developer — I wrote some lab automation and image processing software that was marketed by Axon Instruments for several years — and I can tell you as someone with feet in both worlds that the genome is nothing like a program. The hard work of cellular activity is done via the chemistry of molecular interactions in the cytoplasm, and the genome is more like a crudely organized archive of components. It's probably (analogies are always dangerous) better to think of gene products as like small autonomous agents that carry out bits of chemistry in the economy of the cell. There is no central authority, no guiding plan. Order emerges in the interactions of these agents, not by an encoded program within the strands of DNA.
I'd also add that the situation is very similar in multicellular organisms. Cells are also semi-independent automata that interact through a process called development in the absence of any kind of overriding blueprint. There is nothing in your genome that says anything comparable to "make 5 fingers": cells tumble through coarsely predictable patterns of interactions during which that pattern emerges. "5-fingeredness" is not a program, it is not explicitly laid out anywhere in the genome, and it cannot be separated from the contingent chain of events involved in limb formation.
That's a difficult and abstract concept that's hard to get across to students who are seriously studying the subject, let alone ignorant creationists who have no awareness of the biology. This guy, though, knows one thing and one thing only — how to write software — and digs his hole deeper and deeper.
To be more specific, since DNA alphabet consists of 4 nucleobases, we can represent a nucleobase with 2 bits data. This means that 4 base pairs can be represented by a byte of data and approximately 4 million base pairs can be represented by a megabyte of data. This means that the entire human genome can be represented by only 750MB of code. From my experience as a software developer, this would have to be highly efficient code. To suggest that 97% of DNA is junk implies the implausible -- that less than 23MB of DNA is not junk. By comparison, Microsoft Word has a size of 12MB.
The genome is not code, efficient or otherwise. Sure, you can tally up the bits needed to store the sequence in a database, but that is not the same as saying you've got the complete information for an organism, or that you have captured the "code" that can be executed to build it. Rather than realizing that maybe his analogy is faulty because it leads to conclusions he finds unlikely, this creationist is so convinced of the accuracy of his analogy that when he finds it leads to incomprehensible results, he decides that biology and the reality of the genome must be wrong.
I think it's more probable that the human DNA which we have discovered so far doesn't contain all the information required to produce humans. I wouldn't be suprised if more DNA, or some other kind of information, is discovered some time in the future.
Many of you may have seen this infamous creationist quote, which is a perfect example of an oblivious ignoramus overlooking the obvious.
One of the most basic laws in the universe is the Second Law of Thermodynamics. This states that as time goes by, entropy in an environment will increase. Evolution argues differently against a law that is accepted EVERYWHERE BY EVERYONE. Evolution says that we started out simple, and over time became more complex. That just isn't possible: UNLESS there is a giant outside source of energy supplying the Earth with huge amounts of energy. If there were such a source, scientists would certainly know about it.
Stimpson hasn't said anything quite that stupid, but it's only because biology and developmental biology are so much more subtle and harder to observe and understand than the existence of a giant thermonuclear furnace burning furiously 93 million miles away. There is no significant source of extra DNA, but there is additional information generated by the activities of cells during ontogeny. This concept, that the starting material is not the complete final product, but that it requires ongoing input from the environment and from continuing negotiation and activity within the starting material to generate novel features, has only been around for about 2300 years, since at least Aristotle, so I guess I shouldn't be surprised that a creationist would be a few millennia behind. The concept is called epigenesis. It's essential to understanding how a genome generates an organism, and you shouldn't try to force your analogy onto biology if you don't understand it.
But wait! Ignorance is no obstacle to a devout creationist, and Mr Stimpson continues his headlong descent into unchecked failure in another post, in which he tries to claim that there is negligible junk DNA.
Now there are 210 know cell types in the human body. I'll assume that each cell type requires at least 1MB of information. These cell types share a lot of common features so I'll assume there is a lot of common information. Just how much of the information is shared between these cell types is a guess. I am going to assume that 90% of the information in each cell type is shared and 10% is unique. This means that 210 cell types require 1MB + 209 * .1MB of information. Rounding this implies that there is at least 22MB of information in the human genome.
None of this makes any sense whatsoever.
Where does Mr Stimpson get this magic number of 1MB of information for a cell type? He seems to have pulled it out of his butt.
What does he mean by "information"? He blithely equates the information in a cell with a measure of the number of nucleotides in its DNA. This is not valid. Cells have developmental histories that are essential elements in describing their state.
This "210 cell types" number is a widely used value that was taken from a 1960s paper that itself was making only a broad guess from descriptions in histology text. I've griped about this oft-used and ultimately bogus number before, but I can't blame Stimpson for using it … but it really needs to be purged from the literature.
I don't know what the hell he's babbling about when he tries to partition subsets of the genome into unique stuff for different cell types. It doesn't work that way! The entire genome is present in every cell (with some narrow exceptions), and genes get reused in multiple functions in multiple cell types. His whole conclusion is a beautiful example of garbage in, garbage out.
Let's see how much deeper into the muck this guy can sink…
But this is just the information needed to construct the different cell types. More information is needed for spatial orientation and to coordinate activity among cells to perform complex functions like vision, motor control, digestion and tissue repair. Since the most efficient algorithms to just sort n objects have an order of nlog(n) I am tempted to guesstimate by multiplying 22MB by log(210) to get a lower bound. But that would be bad applied math and just plain lazy. But then again I am not exactly getting paid to do this (wink).
This is a transparent revelation of his biases. He thinks there needs to be some nuclear authority that specifies higher level activities like tissue repair and vision — there isn't! There is no map. There is no boss. There is no blueprint. Vision is an emergent property of cells and proteins interacting in development, using tools shaped by four billion years of history. Your preconceptions are not data.
It isn't a Stimpson post without some statement that is jaw-droppingly obvious.
I can think of two other approaches that could be taken. For one of them I need some data points. In particular I need size data about genomes of the simplest multicellular life forms that are well studied and believed not to have junk.
You mean, you ought to have some data underlying your speculations? Whoa. Who could have imagined that.
This is exactly what biologists have been doing for the past century: gathering data and building explanations from the evidence. Isn't this how anyone with any sense would recognize that this is how science proceeds?
Oh, and this whole series of posts has been written because Stimpson doesn't like the idea of junk DNA, another common creationist preconception that they anguish over. Again, it's the evidence that supports the idea of junk DNA, and creationist ignorance does not counter it.
Let's start with something relatively simple for Mr Stimpson. Look up LINEs and SINEs. Long Interspersed Nuclear Elements (LINEs) are pieces of DNA that code for an enzyme that copies RNA (including the RNA for itself) back into the genome. SINEs (Short Interspersed Nuclear Elements) are shorter sequences that don't code for a functional protein, but their RNA is recognized by the LINEs and gets copied back. These sequences do not play a specific role in the economy of the cell, although they do certainly represent a generic drain on cellular metabolism. Mr Stimpson should try to explain the function of these sequences.
Then he can try to explain why his DNA contains 870,000 copies of LINE — taking up 20% of his genome — and 1.5 million copies of SINE. I'd also like to see his software development analogy to these. Back when I was writing software, I don't recall writing small segments of self-modifying, recursive code and sprinkling them throughout my program…or perhaps more accurately, writing software that was one-third auto-loading noise maker and sprinkling a few words of functional code among them.





Comments
A blowhard is still just a blowhard. As an engineer, I have worked with this sort of folk that thinks they know it all - about everything. I always wonder where they end up working, because we usually send them on their way. They are really no good as engineers either.
This guy seems to be a classic example. Fortunately, many engineers are scientists first, and if they have an interest in a subject they actually learn something about it.
Posted by: George | February 24, 2008 11:33 AM
The sad thing is, computer programming can be used as an analogy to DNA and evolution, but in a much different way than he describes. Someone please rip this apart if it's wrong, but I remember hearing that one of the big problems with Microsoft's operating systems for so long was that they were piggybacked onto old DOS code, and with each new version Microsoft would just try to put patches and overlays onto it rather than start from scratch, leading to weird quirks like the Y2K problem. Now that's a good analogy - evolution and bodies have to work within the constraints that are already there, so you end up with a lot of suboptimal parts and systems, and lots of extraneous parts that aren't useful anymore but are difficult to get rid of.
Posted by: Carlie | February 24, 2008 11:36 AM
I'm a software engineer too, and wouldn't dream of correcting biologists, but at least I've kept up with some of the developments in mathematics and computing. Such developments, in the fields of chaos theory, fractals, genetic programming, etc, have demonstrated that emergent phenomena and vast complexity can be produced from the interaction of even simple components.
This guy is taking an algorithmic approach in his software design, and mistakenly assuming that biology works the same way.
Posted by: Tom | February 24, 2008 11:39 AM
Just skimmed this thread. I was getting dumber for reading it.
Stimpson just proved that bumblebees can't fly.
The best demonstration of the role of the genome in specifying organisms is cloning. We can transfer a nucleus from a somatic cell into an egg and get a complete new clone. Dolly the sheep and all that.
Stimpson is just a time cube class crackpot. Nothing to see but some grins and giggles and move on.
Posted by: raven | February 24, 2008 11:44 AM
He says that MS Word is 12MB. WTF?? Does he mean the average size of a Word document, or the size of the installed program itself or the installation files?? Assuming either of the latter two, he's waaaaaaaay off. You can now (purchase and) download Office 2007 from the MS website. The Standard package comes with Word, Excel, Powerpoint, and Outlook, and has a download size of 1.5GB - a "portion" will be removed after installation, if the download package is removed. Let's be very generous and say that 0.5GB will be removed, leaving 1GB for the four programs, for an average of 250MB per program. Even assuming that, say, Powerpoint takes up more space than Word, it's still at least an order of magnitude bigger than what he claims.
There, I've just proven him totally and completely wrong on that one point, and I didn't even pull any numbers out of my ass!
Posted by: Seamyst | February 24, 2008 11:55 AM
How do you REAL scientists find time to learn how to write software? I have been wondering this since I saw Richard Dawkins' "Growing Up In The Universe." I used to live with a programmer and it seemed like a bitch to learn how to program software.
By the way, anyone interested in Richard Dawkins' "Blind Watchmaker" program can Google it and can play with it yourselves. It's quite fun.
Posted by: Steve Ulven | February 24, 2008 11:55 AM
Perhaps if he insists on computing analogies he could take a look at Conway's Game of Life. Four simple rules + initial conditions = masses of complex (and often beautiful) structure. Although it's not a perfect analogy for epigenesis, it shows how the development of a 'unit' can depend on the structure of the units around it, and not just its own information (which in this case is a single bit).
Posted by: Olaf Davis | February 24, 2008 11:56 AM
Stimpson may have been getting some of his information from Richard Dawkins. Do you also think Dawkins is a nut for speaking about the genome in terms of bits and bytes?
http://www.skeptics.com.au/articles/dawkins.htm
Dawkins goes on to discuss junk DNA and epigenetics as well, but he does hold to the idea that DNA is analogous to computer code. Does this make him a crank?
Posted by: Derek James | February 24, 2008 12:00 PM
These selfish ignoramuses need to suck it up and realize how much harm they do whenever they open their big, flappy mouths and let words come out. :/ My father and I have switched places as I've grown up -- now I'm having to argue with him whether the things he's picked up are true. THANKS, STIMPSON. But you probably got your slap on the back.
I'm gonna go out and write a paper on how I think "Jesus" is an anagram for "cheese crackers".
Posted by: Holydust | February 24, 2008 12:01 PM
PZ: I applaud you for breaking all this down, and yet...don't all of us teachers at the secondary school level have a terrible problem? A truth can be stated simply, and yet be impossible for students to understand. That the genome is not the source of all the information needed to build an organism is obviously true to those of us who've studied real living things for any length of time, but is utterly unfamiliar to others, even very well-educated people. Here the problem is not religious bias, but a failure to properly conceptualize the problem due to its subtlety and to the different ways that people use words like 'information' and 'code' in everyday life.
Now, these words are useful to us high school teachers, to put it mildly, in order to introduce the concepts, but almost inevitably their usage reinforces preexisting notions that do not match up well with the biological reality. So I have to admit to being a little stumped. Any suggestions from the peanut gallery?
Posted by: Scott Hatfield, OM | February 24, 2008 12:07 PM
I spent an agitated evening on Stimpson's blog some months ago... you know, the way one can sometimes get caught up in these online quagmires of stupidity.
I just thought I ought to point out - before the Salem Conjecture starts being thrown about willy-nilly - that Mr. Stimpson's grasp of academic computer science is about as thorough as his grasp of evolutionary biology.
Posted by: wjv | February 24, 2008 12:22 PM
Posted by: Les Lane | February 24, 2008 12:25 PM
Are you kidding me? Find the time? It's part of the job for many. There are scores of fields in the numerous disciplines of the biological sciences where knowledge about writing computer programs is an advantage and a necessity. This is especially true in ecology, population genetics, conservation biology, and theoretical models of behavioral ecology that's on the forefront of evolutionary theory.
M$ doesn't make the software to do the specialized tasks we need, we do it ourselves. From writing programs to perform specialized statistical analyses (e.g., bootstraps and randomization analysis) to home range calculations of free-ranging animals to genetic algorithms and dynamic programming to sophisticated image analysis (e.g., GIS applications).
There are many, many professional biologists that know their way around C, Pascal and Fortran.
Posted by: JD | February 24, 2008 12:32 PM
PZ,
I agree with you that the genome and a computer programs are two very different things. How difficult it would be to model the evolutionary process using a computer program?
If it is so difficult for us to write a program that models random mutation and natural selection then how could the earliest forms of life do this just by chance?
Posted by: Matt LaCrosse | February 24, 2008 12:32 PM
You can compare the genome as a data set to the data stored by a computer, and that's legit -- we know how much storage space you need to pack away the sequence. It is not in any way comparable to software.
Posted by: PZ Myers | February 24, 2008 12:34 PM
It is certainly meaningful to say that the human genome contains X megabytes of data, but that misses the point entirely. One dataset may be larger than another, but that doesn't mean it is more complex.
"Complexity" is a very slippery thing, which is enormously difficult to define in a rigid manner. For example, many fractals are fantastically intricate, and yet can be generated by a few lines of code. Therefore, are they very complex or very simple?
Another issue is the "cleverness" (for want of a better word) of how the system utilises available resources. For example, these beautiful animations were all created using only 512 bytes of code, whilst far cruder animations may be a million times larger.
The human genome may be simple in terms of information theory, but in terms of "cleverness" it is second to none (with the possible exception of some viral genomes, which are so compact as to almost defy belief). The laws of physics allow for amazing subtlety, and evolution has taken full advantage of this fact.
Posted by: hyperdeath | February 24, 2008 12:38 PM
And yet you say that programming concepts don't apply in biology!
Posted by: Pierce R. Butler | February 24, 2008 12:45 PM
My jaw hit the floor when he jumped from `coordinate activity among N objects' to `algorithms to sort N objects'. There's no need to sort the objects you're coordinating even if you *do* have a central coordinating agency, and what on earth he considers might `sort' cells (using a non-parallelized sort!) I have not the least idea.
Posted by: Nix | February 24, 2008 12:45 PM
I take it Stimpson has never even played with some of the classic toys of computing: Game of Life and Mandelbrot generators. I thought everyone wrote themselves one or both of those, just for fun, at some point (at least, I did). There's some fantastically complex behaviour in there, but it can be produced by a few kilobytes of code (ignoring all the overhead the linker has to add to make it play properly with the o/s).
Posted by: Eamon Knight | February 24, 2008 12:46 PM
Well, as a graduate student, I say you're totally wrong. To get that level of ignorance, you'd need at least 37 misconceptions, and we all know that 37 is much more than the technical definition of "few." This is like saying that someone could be unable to save a document file that's under 1 MB, and we all know that's absurd.
I will never understand how these kooks make writing gibberish look so easy. I always have trouble. :)
Posted by: Tom Foss | February 24, 2008 12:47 PM
You guys don't take prisoners, do you? Somebody says something you don't like and it's time to fling vitriol and dung. I can hear baa! baa! as the flock follows its shepherd.
I remember doing similar sums in about 1980 working out that the entire ACGT encoding of a set of human chromosomes would fit on 1 or 2 (I don't remember!) standard 1600bpi magnetic tapes (it was that long ago, before CDs) and I found that an interesting number.
A single egg cell can produce a full-size organism, so this amount of DNA plus existing cellular machinery plus environment is all an organism needs. And given that all primary protein sequences are specified by DNA, and (almost) all of a cell's machinery is based on proteinaceous enzymes, it tells you about the amount of information encoding the bulk of the primary biochemical machinery.
Personally, I find myself amazed that so little information can specify organisms with terrific regularity and complexity. Where's your sense of wonder? Isn't it even amazing that our developmental systems are so finely tuned that we end up with two arms (with the requisite number of fingers) of similar length even though they are separated by huge distances relative to our cellular architecture?
Isn't it terrific too that we have about 6 feet of DNA in each cell, but this can reliably itself and separate and duplicate during mitosis?
It's a mistake to lose one's sense of wonder. And being gobsmacked by nature's complexity doesn't make one either an idiot or a closet theist.
Posted by: P Delta Effect | February 24, 2008 12:48 PM
Look at the bootstrapping process of a computer. Small programs have "just enough" code to launch the next program, which has slightly more capability. A DNA strand doesn't magically turn into a person- I can't throw DNA drops onto the pavement, water them, and grow a human. There are some necessary bootstrap conditions. In essence, DNA is not so much a "program" as it is a piece of information. This is a false analogy. Unfortunately, this person has used an invalid comparison to reach a false conclusion.
Posted by: Greg | February 24, 2008 12:49 PM
Well, I'm not a geneticist or a molecular biologist or a softwear engineer, but I AM a Mom, and I have baked a cake. And I've never put all the ingredients together and had it turn into a cat or a dog or anything but a cake. Bears turning into whales? Fish sprouting wings and flying up into the trees? Oranges giving birth to people?
Betty Crocker says "no."
Posted by: Sastra | February 24, 2008 12:58 PM
Is this a rhetorical flourish, or has it actually been worked out by explicit study that human genetics exceeds that of all other metazoans in some specific measurement of efficiency?
Even compared to e. coli? to kitties? to squid?
Posted by: Pierce R. Butler | February 24, 2008 1:03 PM
Not much of a programmer, either. The fastest sort algorithm is order N. The spaghetti sort algorithm. Can't be implemented in common hardware, but it is order N.
Take the numbers to sort, and cut lengths of spaghetti corresponding to the numbers. Grab the spaghetti in your fist and smack it hard against a flat surface. Stand up the stack and pull out the stick that is longest (it sticks out the highest). Measure it, and repeat. Order N sort. It was described in the mathematical recreations column of SciAm many years ago, as the "spaghetti analogue gadget". It was noted that, because the analogue steps are rather slow, you need a very large set of numbers before this can out-perform an N.logN algorithm on modern (at the time) computers. I believe they remarked that the "flat surface" would be about the size of the moon, but work with enough numbers, and N beats N.logN.
This isn't just semantic sniping and gleeful nitpicking. Biology is an excellent example of analogue behaviour, so don't believe that the world of digital algorithms and von Neumann machines accurately parallels the behaviour of a biological system.
Posted by: Christopher | February 24, 2008 1:05 PM
genome = genome maker
Posted by: danley | February 24, 2008 1:05 PM
PZ says:
From Chapter 5 of The Blind Watchmaker:
If you're going to mock the guy for equating DNA with software then you're going to have to call Dawkins on it as well.
Posted by: Derek James | February 24, 2008 1:11 PM
Maybe the term "computer code" is a source of confusion for Derek James (not so much for Stimpson). The term is usually used to mean the executable instructions that the programmer provides to the CPU. This is what Stimpson is talking about.
The word "code" can also mean a method of storing data in binary format, as in, for example, the "American Standard Code for Information Interchange" (i.e., ASCII). This -- not ASCII specifically, but the encoding and storage of data in the form of binary digits -- is what Dawkins was talking about.
As the genome encodes data (mostly protein sequences), it's perfectly valid to compare DNA to the binary storage of data on computer media. To compare DNA to the number of bytes required to encode algorithms, on the other hand, is bogus.
Posted by: noncarborundum | February 24, 2008 1:14 PM
Okay, I posted before I saw the follup from D.J. with the "It's raining ... algorithms" quote. Bummer.
Posted by: noncarborundum | February 24, 2008 1:18 PM
Can someone give him the onion test? amoeba test? fugu test?
Posted by: natural cynic | February 24, 2008 1:18 PM
Oops, he did take the onion test - and flunked by handwaving:
I could speculate that perhaps the DNA of onions also serves a purpose for the animals that eat it -- after all, it is a food source
same with nutrition. Still needs to take the fugu test.
Posted by: natural cynic | February 24, 2008 1:28 PM
Somebody point the bozo to L-systems as an example of how to get complex structures from a simple set of instructions.
http://en.wikipedia.org/wiki/L-system
Posted by: Zirrad | February 24, 2008 1:41 PM
Shorter version: Sure, God created the entire universe in only six days. But He didn't have an installed base.
Posted by: noncarborundum | February 24, 2008 1:43 PM
Stipulated: Randy Stimpson is a stupid, clueless, pathetic, Creationist idiot.
However, the question that he raises is far more subtle and nuanced that PZ Myers herein allows.
I speak as someone with lesser Biology credentials that PZ (I have only about 25 publications and conference presentations in the field), but enough to be negotiating for a Research Scientist position in Biological Networks at Caltech.
On the other hand, I have 42 years of computer/software experience, significantly more than does PZ.
I strongly feel and have since before I began my Ph.D. dissertation research in 1975 (in what's now considered Nanotechnology, Artificial Life, Systems Biology, and Metabolomics) that there is a profound relationship between Genome/Proteome/Physiome and Source code/interpreted or compiled object code / effected change in embedded system or robot behavior or client-server interaction.
In my dissertation, I sometimes referred to "genocode" versus "phenocode." Several chapters of that dissertation have now been published in refereed venues.
The question: "what is the channel capacity of Evolution by Natural Selection" and the related question: "What is the Shannon information in an organism's genome" is a very hard question, which we have discussed in this blog and elsewhere. I have a draft paper of some 100-page length sitting on a NECSI wiki, triggered by a what I took to be a good question from an annoying Intelligent Design troll; said wiki paper draft online thanks to the dedicated work of the admirable Blake Stacy, for about a year, which I have not had a chance to complete, due to little distractions such as life-threatening medical condition, 9 days in hospital, and 6 weeks away from the classroom teaching that I love.
I think that there is common ground between the naive "DNA = computer" myth and PZ's very thoughtful description above, which I quite enjoy:
"... the genome is nothing like a program. The hard work of cellular activity is done via the chemistry of molecular interactions in the cytoplasm, and the genome is more like a crudely organized archive of components. It's probably (analogies are always dangerous) better to think of gene products as like small autonomous agents that carry out bits of chemistry in the economy of the cell. There is no central authority, no guiding plan. Order emerges in the interactions of these agents, not by an encoded program within the strands of DNA."
"I'd also add that the situation is very similar in multicellular organisms. Cells are also semi-independent automata that interact through a process called development in the absence of any kind of overriding blueprint. There is nothing in your genome that says anything comparable to 'make 5 fingers': cells tumble through coarsely predictable patterns of interactions during which that pattern emerges. '5-fingeredness' is not a program, it is not explicitly laid out anywhere in the genome, and it cannot be separated from the contingent chain of events involved in limb formation."
I should like to point out that Artificial Intelligence (my M.S. in 1975 was for work on the borderline between AI and Cybernetics), Agent-based software, and Quantum Computing have brought "program" into a new paradigm, as much as genomic and post-genomic research and data have brought DNA/RNA/Protein into such a new paradigm that the very word "gene" is difficult to properly define at any level of education.
Posted by: Jonathan Vos Post | February 24, 2008 1:50 PM
As more evidence that this guy has his head up his ass, I give you something he wrote in March of last year on Entropy & Evolution that shows that he doesn't know much about either and which contradicts his later confused notions and he shows his ignorance of genetic algorithms and biology of aging:
Now since I am a software developer, mutation (development) and selection (testing) of complex systems is an everyday activity for me. So there are similarities between what I do for a living and the concept of evolution. This difference is that DNA is considerably more complex than software. Yet no one develops software by random mutation and testing alone. Instead of random mutation the software development process employs intelligent design. I don't believe that random mutation has any place in the software development process -- so why should I believe in evolution?
Off to the dustbin, non-Intelligent Designer.
Posted by: natural cynic | February 24, 2008 1:58 PM
It's not unreasonable to compare DNA code with computer code for purposes of simple analogy (Or at least it seems so to me having had some experience in the latter and none in the former). However certain differences are glaring. The most obvious one is the fact that nearly[1] all code is human designed and thus tells a sort of 'story'. You can look at various bits of it and realize what they are for. Find the modules involved and so forth. It's obviously designed. It certainly doesn't have heaps of nonsensical scribbles which don't accomplish anything at all. It doesn't have a sort routine repurposed as a screen refresh algorithm by way of being a tic-tac-toe playing AI and a pseudorandom number generator.
Further, a major difference is the platform itself. Computers are, by and large, synchronous, discrete and deterministic. They execute algorithms and operate with precisely quantized data. DNA 'executes' (and the term is a poor one) in a environment that couldn't be more different. The most glaring difference is that there is no CPU analogue during the development of an organism. No central authority. Instead all control is emergent.
About the size of Word (12 MB ?!), well, I'm assuming he measured the size of the executable itself. This is rather silly seeing as he didn't take into consideration the untold megabytes of shared libraries linked to it or a incredibly complex operating system which is required to turn the abstract request for services into actual operations on the hardware which, in turn, has its own abstractions and inernal software and so on. A lot more information goes into getting that Word screen with the flashing cursor than a piddly 12 MB.
[1] But there are exceptions. For instance there is a pathological programing language named Malbolge that's deliberately so perverse it's impossible to write even a Hello World program. One has been generated, however, using a genetic algorithm. No-one knows how it works. So it looks designed (because it does something) but isn't. Much like some other things I could name. You can find more on this on Good Math/Bad Math. I'd link to it but then the spam filter might eat me.
Posted by: Veljko | February 24, 2008 1:58 PM
I'm surprised that nobody's pointed out this horrendous leap in logic already:
When we say that comparison sort algorithms are "order of nlog(n)", we're talking about the time necessary to execute them, not the total size of the sorting instructions themselves. When we go from sorting an array with 1,000 elements to one with 1,000,000 elements, that doesn't mean we need a larger program to handle it!
Besides that, there are (non-comparison) sort algorithms that are faster than n log(n), such as the spaghetti sort described by #24, and many of them can be implemented in software. Apparently Stimpson has never heard of bucket sort, counting sort, pigeonhole sort, etc.
Posted by: Ian | February 24, 2008 1:58 PM
I work in a lab that's trying to pick out programs* of gene expression which enable the development of different types of projection neurons in the forebrain. From our data, only about 0.5% of the genes expressed in one type of projection neuron are expressed uniquely to that type. The vast majority of genes expressed in neurons of any kind are shared, because all types of neurons need to do many of the same things.
And of course, even the 0.5% of genes expressed "uniquely" in different forebrain projection neuron types are not expressed only in neurons -- many of the same genetic programs used to pattern the brain are also used in the development of other organs and systems. A gene that I'm particularly interested in is involved in generating two different forebrain cell types, as well as T cells in the immune system.
(*Of course, I'm not using "programs" here the way Stimpson is. When we talk about programs in our lab, we just mean combinations of genes that result in the development of a certain cell type. So far we have described three genes that are critical in combination for the formation of corticospinal motor neurons, which sit in layer 5 of the cerebral cortex and send an axon to the spinal cord.)
Posted by: Mollie | February 24, 2008 1:59 PM
God, what an imbecile. Not only does he not know anything about biology, his approach to programming is not particularly bright.
Complex programs require a lot of code, but that's because there is a lot of functionalities build into them; each part of the code, is usually pretty simple - just the right sequence of variable declarations, for-loops, if-statements etc.
And if you know how to code correctly (something I certainly don't take for granted that the moron can), you are able to reuse the same code several places, by referring to it (either as functions or as objects, depending on your programming language).
Anyway, it doesn't matter, as PZ says, there is nothing alike in program code and the genome.
But what do I know? It's not like I make a living in software development.... oh, wait...
Posted by: Kristjan Wager | February 24, 2008 2:08 PM
Add genetic programming to the list of computer science topics he's clueless about.
Posted by: MartinM | February 24, 2008 2:09 PM
#24: Biology is an excellent example of analogue behaviour Hear, hear (and written like a true Brit, with silent "u"'s twice over) Biological dynamics are indeed analog, only metaphorically comparable to digital systems. The same goes for psychological dynamics, which computers model only as poetry. For example, here's a dynamic with a difference: digital memory is forever and is nearly perfect; animal memories fade with time and vary in accuracy. In humans it's well studied that most of one's memories aren't of the original event as such but are memories of memories - that is, one encodes the event and then remembers the encoding, more or less accurately (more less than more) over some of one's life. Digital memory is only a metaphor for analog memory.
#22: Betty Crocker says no. But she should agree that DNA works as a recipe rather than as a blueprint. That is, DNA specifies a procedure which usually works given ingredients and a mostly reliable environment - the bun will come out of the oven 8^) This is very different than an architect's or engineer's blueprint, which specifies every detail of the final product. This difference may suffice to explain the fact that much more DNA is commonly found in plants than in animals. Plant DNA has to specify in advance the many chemical/protein components which are the plant's innate and only way of responding to its world (saps, toxins, et al), while animal DNA provides a recipe for a brain which responds behaviorally to the world in less-specified ways based on moving about.
Posted by: thwaite | February 24, 2008 2:10 PM
there is something like an irony involved in the fact that most creationist morons are right-wingers, staunch republicans, and yet their incomprehension of biological development shows that at heart they really believe in the communist central planning model.
"all order must come from on high!
larger systems need larger, more centralized governments, with more laws and regulations!
how can you say that the u.s. economy moves 13 trillion dollars every year without being able to point to a commissar of plumbing fixtures? impossible!
in fact, i calculate that there are at least 210 separate commissars for centralized production needed to coordinate the separate spheres of production--plus a supreme soviet to organize all of them!"
the idea of autonomous entrepreneurial agents pursuing their own agendas, each with a fragmentary and incomplete knowledge at their disposal, and yet creating a vast system of incredible complexity? baby, that's the american way.
and long before that, it was the evolutionary way.
celebrate it: evolution is american. creationism is communist.
Posted by: kid bitzer | February 24, 2008 2:16 PM
Carlie wrote:
Exactly, it is an analogy, nothing more.
Historically, there seems to be a strong temptation to model biological and physical processes in terms of comtemporary technology. For Newton - and Paley - it was clockwork. Later it was telephone exchanges and now it is computers.
But an analogy only holds to the extent that the two cases being compared are similar. Yes, the willow tree seed plainly contains something like 'information' or a 'set of instructions' or a 'recipe' or 'software' for making another willow tree; but being "something like" does not mean 'the same as'.
To put an analogy into perspective, it is necessary to be just as aware of the differences as the similarities. To argue that it might as well be raining floppy disks as seeds is to miss the essential point that it is seeds the tree produces, not floppy disks.
DNA, viewed as a storage medium, is very different from magnetic or optical disks. The way 'information' gets into the DNA, the way that 'information' is used to make another willow tree, indeed the very nature of what we are calling 'information' in a genome is far from being the same as what is whizzing around inside a computer.
Computer geeks are so dazzled and enthralled by the toys they play with that they almost completely ignore what is different about biological systems. Yes, living things can be modelled in terms of computers to some extent but the key lies in the differences. Computing can provide some insights but, ultimately, living things must be understood as something different, something that stands in its own right and something that must be understood in its own terms.
Posted by: Ian H Spedding FCD | February 24, 2008 2:20 PM
Christopher (#24), or Ian (#36):
Surely looking at the spaghetti and taking the longest piece isn't an O(1) operation? I imagine picking the longest from a pasta-covered wall the size of the moon would take significantly longer than doing so with a mere handful.
Posted by: Olaf Davis | February 24, 2008 2:21 PM
"That just isn't possible: UNLESS there is a giant outside source of energy supplying the Earth with huge amounts of energy."
How the fuck can someone with enough logical sense to code write something like this AND totally miss it? Oh yeah, willful ignorance.
Posted by: Brando | February 24, 2008 2:25 PM
Ok, scratch that. I misunderstood the algorithm; of course it's O(n).
Posted by: Olaf Davis | February 24, 2008 2:26 PM
For computer programs, it's rather important that the code find itself in the correct operating system environment - or it won't run at all. It gains from external sub-routines at its interfaces (monitor, keyboard etc). Even compiler options on a local machine, as well as cross-compilers, can significantly affect the executable program "life-form" one gets from the same basic piece of code. The presence of a virus on the compiling or executing machine can cause even more havoc.
Posted by: SEF | February 24, 2008 2:26 PM
Maybe if he thought of DNA as code tied directly to the timing of an inherently multi-threaded, chaotic CPU called reality, the difficulties in simulating it would be more apparent.
As for code that copies and rewrites itself as it goes along, a good model could be entries for Core War. If not educational for arrogant software engineers, its fun...
Posted by: Martha | February 24, 2008 2:28 PM
He says no intelligent designer would employ random mutation and testing alone. Yet random mutation and testing - in the sense of being tested by the environment for being fitted for survival - is what is observed. Thus, there is no intelligent designer and he should "believe in evolution".
Posted by: Ian H Spedding FCD | February 24, 2008 2:34 PM
kid bitzer #41 wrote:
Not so much communism, as Monarchy. Divine Monarchy. Remember, this is pretty much the same crowd which insists that the concept of constitutional democracy would be impossible if there wasn't a King of the Universe and Lord of Lords to whom we owe our absolute obedience -- and He orders us to create a constitutional democracy, to demonstrate our recognition that He created us all equally beneath his unquestionable authority.
Apparently Cranes only succeed in forming structures created from the bottom-up if they are first were levitated into place and commanded to do so by the Sky-Hooks.
Posted by: Sastra | February 24, 2008 2:36 PM
Holydust, I actually appreciate that these people are opening their mouths. I invariably get more information about biology and how to think about it properly whenever they do. This is not the first time that I've heard the 'DNA is the software of life' analogy before.
It is unfortunate that they don't seem to get as much out of this as I do. :)
Posted by: Chris | February 24, 2008 2:36 PM
Brando, actually the ball of energy was someone else.
Posted by: Chris | February 24, 2008 2:40 PM
Brando, actually the ball of energy was someone else. In fact, Stimpson is aware of the sun. He has his own view on why the entropy argument is valid. Which is depressing. With luck however, Mark will pick it up.
Posted by: Chris | February 24, 2008 2:45 PM
No, some biological dynamics like inheritance are digital. This is the way out of the blending inheritance dilemma.
Posted by: windy | February 24, 2008 2:46 PM
Regarding those LINEs and SINEs: I'm not all that up on "junk DNA," though I gather that term itself is a problem...? Anyway, it's worth noting that thanks to the "cleverness" of evolution, as cited by hyperdeath in #15, I wouldn't be shocked to learn it's found at least some minimal use for their presence.
Posted by: rrt | February 24, 2008 2:46 PM
Hrmm. On #24, that "von Neumann" should have been "Turing". Silly brain misfire.
Posted by: Christopher | February 24, 2008 2:47 PM
thank you for this
>> It's probably (analogies are always dangerous) better to think of gene products as like small autonomous agents that carry out bits of chemistry in the economy of the cell. There is no central authority, no guiding plan. Order emerges in the interactions of these agents, not by an encoded program within the strands of DNA.
I would suggest that is very much the process of all of the biological complexity we see. It is a humbling concept to be sure and very clearly stated. We ourselves and our own self awareness can be seen as having arisen from the interaction of just these "autonomous agents" and not a "supreme Being" with a plan for our "immortal soul"
I would suggest it is this growing understanding of biology that is fueling the backlash of all fundamentalists around the world. That and the ease of communication especially digital communication, the internet, making it easy for anyone with computer access to say anything to anyone anywhere.
Posted by: uncle frogy | February 24, 2008 2:48 PM
Even if DNA can be said to be analogous to computer code (and I don't think it can), the comparison of size is invalid.
First, an executable (like Microsoft Word) is not just composed of program code, but also data. Thus, Stimpson is not actually comparing program sizes, at all.
Second, computer programs are generally written and compiled to optimize performance and maintainability, not space. Loop unrolling and function inlining are two examples of optimizations that increase program size to achieve better performance. Programmers may introduce layers of abstraction to achieve better maintainability of the code at the cost of both performance and program size.
Third, and probably most important, the target architecture plays as much a part in the program size as the program itself does. Some architectures have lots of instructions that make it possible to express complex operations in very little code, while others will require more code to perform the same operations. Whether or not DNA is comparable to computer code, it is absolutely certain that the mechanisms of gene expression are not at all like a von Neumann architecture, and certainly nothing at all like a modern Intel CPU.
Posted by: dak | February 24, 2008 2:53 PM
Wait, so you're telling me the sea monkeys I'm constantly putting into my gas tank don't do anything?!
Posted by: Saint Gasoline | February 24, 2008 2:55 PM
I may be biased by my own occupation and predilections (I am a professional software developer and read evolutionary biology for fun), and by my own sources of information, e.g. (or perhaps even i.e.) Dawkins, who claims that the correspondence between DNA and computer code is literal. Obviously, Dawkins is fallible like the rest of us, but I do think he's someone whose words we should consider.
I also don't think that the originally cited creationist is necessarily stupid (from this alone, anyway), so much as seduced by what Dawkins calls "bad poetic science" (see Unweaving the Rainbow), or perhaps taking the analogy too literally. I personally think that (computers being general computation devices) it is entirely reasonable to suggest that DNA may be explained in terms of what computers can do. That does not make it reasonable, or even sensical, to suggest that it works as our computers do.
Someone with knowledge of software development and biology might suggest (or not: I don't know biology) that it is a good explanation, but that counting bytes of hopelessly naïve. Not only do we have the "How much overlap?" question, with the creationist suggesting 10% unique gene expression as a wild but "generous" guess, and a commenter citing 0.5% with claims of actual supporting data; this also overlooks the possibility (which I gather is not merely hypothetical) that the same stretches of genome may be used to code more than one thing depending on where in a particular sequence the "read head" is placed. This offers the possibility of more compact coding by an order of magnitude (and has subsequently been done, to verify the possibility, with software). We might also well consider the genome a compacted or compressed length of code (it does not, after all, have to "execute" very quickly; the human DNA program has nine months to bootstrap itself). We also haven't spoken of the intrinsic instruction set, as the four bit-analogue values in DNA have intrinsic properties, unlike the computer's data that are merely read.
It seems to me that there are some more calculations to be done before we can decide either whether the computation analogue truly needs to be thrown out the door, or how much (decompressed, linearised) information our genome can (from this point of view) be considered to encode.
Posted by: Petter Häggholm | February 24, 2008 2:56 PM
I don't think the computer program analogy is that far off.
But the problem lies in the type of computer that executes it. Nature (Chemistry/Physics) is what executes the DNA code and our attempts to just simulate a small part of that computer (like: http://folding.stanford.edu/) show just how complex the machinery is that interprets the code.
Posted by: Michael | February 24, 2008 2:58 PM
The human genome is elegant in its reduced size. One need look no further than the immune system to see how one can encode complexity from small amounts of coding information. The point is that the coding potential is not linear with the number of base pairs due to alternative splicing of the information. Who knows how complex the transcriptome and proteome is from the 3 billion base pairs.
I would speculate that the human genome has lost an incredible amount of information because of our heterotrophic life style. Our genome no longer encodes information for essential amino acids that we obtain from our diet.
Posted by: DobyGS | February 24, 2008 3:01 PM