In the Beginning Was Linux?

It was eight years ago that some computer programmers got together and issued a manifesto for something they called open source software. Conventional software development--kept hidden behind walls of intellectual property, copyright, and secrecy--was clumsy and slow. It would be far better, the open source advocates declared, to make software open to all. It would foster the growth of a vast decentralized community of developers and consumers who could work together to create better software together. Individuals would grab software created by others, tinker with it, and then make it available in turn to the community for more testing and tinkering.

The open source movement may not have taken over the world yet, but it certainly has thrived. Take a look at the web site for the eighth annual Open Source Convention. Along with a vast range of talks about everything from Perl to seventeenth-century censorship, you may notice the big corporations such as IBM that are sponsoring the event. Corporate fear of the open source movement seems to be shifting to acceptance, if not enthusiasm. Another sign of the open source movement's health is its influence beyond the world of software development, from mash-ups to Wikipedia to open source as a force for democracy to open-source biology.

Its success has drawn curious minds back to the origins of the open source movement. In a sense, people were thinking about it long before it had a name. Eric Raymond, one of the founders of the official open source movement, puts its origins four decades ago, in the hacker culture of the 1960s. Back then it was expected that each hacker would share his secrets with the rest of the hacker tribe.

I'd suggest that Raymond is not be thinking big enough. The open source movement is a wee bit older. Instead of four decades, try four billion years.

Biologists have long recognized some striking parallels between genes and software. Genes stored information in a language of DNA, with the four nucleotides serving as its alphabet. A genetic code allowed cells to translate the information in genes into the separate language of proteins, which used an alphabet of twenty amino acids. From one generation to the next, mutations introduced slight tweaks to the software. Sex combined different versions of subroutines. If the software performed better--in the sense that an organism had more reproductive success--the changes might become incorporated into the genome across an entire species. This was only a metaphor, but it was a powerful one. One example of its power is the rise of genetic algorithms. Rather than trying to find a perfect solution to a problem--the ideal shape for a plane, for example--genetic algorithms create simulations and tweak them through a process that mimics evolution. The algorithm can seek out good solutions very effectively.

This sort of evolution resembles old-fashioned, closed-source software. All of the innovations happen in-house--that is, within a single species. None of the solutions from one species can be incorporated into the operating system of another. While this process has indeed been an important one in the history of life, a number of scientists have argued for an open-source side to evolution.

The simplest example is antibiotic resistance. A person who gets sick with dysentery takes a single kind of antibiotic, and before you know it, he or she has a gut full of multiple-drug resistant pathogens. This occurs because the harmless bacteria in the gut may carry genes that provide massive resistance to many antibiotics. They can pass these genes en masse to a few of the newly arrived pathogens. The antibiotics kill off the vulnerable bugs, leaving behind the resistant ones to thrive.

Well beyond antibiotic resistance, genes are showing an impressive capacity to move from species to species. The causes of this transfer are many, but viruses have proven to be the most important. They pick up genes from one host and insert them into another. Viruses themselves can mix their genomes together, blending genes from their hosts in the process. This horizontal gene transfer is not so important in the evolution of animals like us (although we do carry thousands of dead viruses in our genome). But it is very important among single-celled organisms. Single organisms (and viruses) represent most of the genetic diversity on the planet, and for most of its history life was microbial. So this trade in genes has been a significant force. In many cases these transferred genes turned out to be useless to their new hosts. But often enough, the new genes proved to be a powerful addition to a genome. And in their new home, those genes were modified through ordinary natural selection before being spread back out to other species. The parallels to Linux and other examples of open source software did not go unnoticed by biologists. One recent review described this process as open-source evolution.

Some scientists have argued that the sort of gene trading we see today is nothing like what was going on during the early years of life on Earth. Back then, primitive organisms did not have major barriers in place to keep foreign genes from joining their genomes. No one has championed this view more than Carl Woese of the University of Illinois. Woese is famous for having carried out the first significant studies on the tree of life by analyzing molecules from different species. He discovered that life formed three major branches, the eukaryotes (that includes us), bacteria (E. coli and such), and archaea (microbes that some scientists argue are closer to eukaryotes than to bacteria). In recent years Woese has argued that the base of this tree was actually more of a web of relationships, as genes moved from host to host.

I've been reading Woese's latest venture into this weird time, a paper that was just published in the Proceedings of the National Academy of Sciences called "Collective evolution and the genetic code." The genetic code is the dictionary by which genes can be translated into proteins. Each amino acid corresponds to a series of three nucleotides in a gene. In some cases, different triplets produce the same amino acids. Thus, twenty amino acids are generated from sixty possible DNA triplets. This new paper is something of a nostalgia trip for Woese, inasmuch as before he started studying the tree of life in the 1960s, he helped make sense of the genetic code.

One of the biggest discoveries of that time was that the genetic code is pretty much the same in every living organism on Earth. Scientists have long debated how the same genetic code wound up in all living things. Why twenty amino acids? Why three nucleotides? One possibility was that it was just a "frozen accident." Another has been that it evolved in an ancient lineage and provided an evolutionary edge against others with different codes. Indeed, when scientists run computer simulations to compare it to alternative codes, it does work extremely well.

Woese, along with Kalin Vetsigian and Nigel Goldenfeld, also of the University of Illinois, offer an open-source perspective on the origin of the genetic code. They argue that in the gene-swapping early world, genes encoded proteins in a very sloppy manner. Cells had not yet evolved the enzymes that ensure that one codon always produces one amino acid. The translation was thus much rougher than today. An ambiguous dictionary may seem like a serious handicap for any species. But back then, there were no modern, finely-tuned organisms around to compete with the early organisms. They weren't great, but they were the best at the time.

Evolution gradually produced more precise genetic codes, Woese and his colleagues argue, but different communities of microbes evolved different codes. In each community, a shared code made it easier for microbes to share genes. If you plug a gene into an organism with a radically different code, it will produce a radically different protein--mostly likely one that is useless as well. It's like grabbing a piece of software and trying to run it on the wrong operating system.

The more microbes used the same genetic code, the bigger the pool of genes they could all take advantage of. Those shared innovations benefited the entire community as it competed with communities with other genetic codes. Imagine microbes colonizing some bizarre new ecological niche--a seep of petroleum, for example, or undersea volcanic chambers. The microbes that can take advantage of more innovations will outcompete the ones that belong to the smaller community. This advantage would also drive the evolution of different genetic codes to be more like one another, because communities of microbes would get access to even more innovations.

Over time, the benefits of a big innovation pool wiped out the original diversity of rare codes, replacing it with one universal language. Only later did life begin to lose its communal nature and begin to evolve into separate lineages that we see now as the tree of life. While those lineages produced things as different as humans and bacteria, they all share the same genetic code that evolved during that communal age.

Woese's provocative ideas are least consistent with the evidence at hand, and he and his colleagues offer some mathematical simulations in the paper that show that gene swapping can drive life to a universal genetic code, but that our more familiar generation-to-generation heredity cannot. But we're still in the early days yet of understanding the nature of life four billion years ago. (I wrote about some competing models in Science in May.) At this point, what intrigues me most is how the human-based open source movement may be able to shed light on the early evolution of life on Earth (and vice versa). How have open source languages merged together to make meaningful exchanges of innovation possible? Perhaps open source evolution could inspire new ideas about building software the open-source way--much as genetic algorithms emerged a couple decades ago.

This blog being semi-open source (no hacking my text, please), I invite your comments...

Tags
Categories

More like this

Living things, from bacteria to humans, depend on a workforce of proteins to carry out essential tasks within their cells. Proteins are chains of amino acids that are strung together according to instructions encoded within that most important of molecules - DNA. The string of "letters" that make…
The language of DNA is written in a four-letter alphabet. The four different chemical units of DNA (called nucleotides) create an incomprehenisbly vast range of possibility codes. Consider a short sequence of 41 nucleotides. There are over 4.8 trillion trillion possible sequences it could take. In…
The current issue of Nature features this interesting essay by Nigel Goldenfeld and Carl Woese. The essay's point is that recent discoveries about genomic interactions among microbes, particularly the phenomenon of horizontal gene transfer (HGT), is forcing us to reevaluate certain basic concepts…
A lot of people think of viruses and bacteria in our bodies as nothing more than pests. It's certainly true that a lot of them do an excellent job of making us ill. But some viruses and bacteria merged with our ancestors over the course of billions of years, and if you were to have them removed…

Ingenious. As a software developer, this article was so easy to understand, it makes me want to take a class in biology. I love this blog.

Cool!

I'm a biologist who has been using OpenSource and Linux for quite some time now, and the parellels didn't strike me at all.

Just been reading your book 'Parasite Rex' and came across this article yesterday. If I had read it a few years back, I might have become a parasitologist - for now, I remain a humble neurobiologist. Maybe I should think about Toxoplasma!!?

http://biology.plosjournals.org/perlserv/?request=get-document&doi=10.1…

I installed Linux 0.99pF Fall of '93. Even at that time, I had a full X workstation, which was "very cool" (in a geeky sense) because I could use it to run VMS/VAX software in my dorm room.

Even at that time, the X11 system, EMACS, and a few other pieces of open source software were developed, functioning pieces. We had GCC (GNU C Compiler) installed on our NeXT machines because it was portable, etc.

Open source is definately older than 8 years old.

I always think of the open source movement as being akin to the protestant reformation. Before Martin Luthor translated the bible into German, the only people who could read it were people who could read latin. So the ordinary people had to rely on the priest to interperet the bible for them, this gave the priest much more power. The depth of these issues can be shown by what happened to William Tyndale when he copied Luthor and created the first proper English bible, Tyndale was burnt at the stake for this. The reason being, by translating the bible into English he was seen as empowering the common folk. Fast forward five hundred years and we have William Gates sitting on the most obscene mountain of money ever seen, this is because money is the real God of today and Mr Gates is the Pope. He has achieved this by keeping the source code for windows a secret. The sad thing is, if when Micro$oft released Win95 they had have also released its source code, Windows would have become the best operating system in the world, instead of the worst malware and virus laden pile of dung ever seen.

Hi Carl,

I know some of the commenters above already made this remark, but the GNU/Free Software movement is much older than that and, in fact, is/was the real engine behind all this.

Personally, i've been using GNU/Linux since 1994, which gives it at least ~12 years. But, for more on the history of this, i recommend the following links: Free Software Foundation, GNU, Free software movement, Free Culture movement, Why ``Free Software'' is better than ``Open Source'', Open source and OSS/FS.

I don't mean to advocate either way (although i do have a position on the matter), but just to put things under a more historical context and perspective and, thus, show that things are a bit different [and more intricate] than what's written above.

Having said that, regardless of your (or anyone's) position on this matter, i recommend watching a talk by Richard Stallman: it's highly entertaining!

Anyway, i hope this information helps.

[]'s.

Yeah, I've always felt this was the strength in open-source software. Normally, the companies with the most money have their software out there. Open Source provides a set of rules for the "Survival of the Most Useful".

Let's say if I write a program that a lot of people find useful, then I'll probably find a few other programmers working on it. And if they add useful features and fix bugs, then more and more people will use it, attracting more programmers, thus ending up with a great software package such as Apache or The Gimp.

If people don't find my software useful, then I'll be the only programmer and it won't thrive. And the great thing is that the end users, whether just an average Joe like me, or a huge corporation like IBM, determine if a software package continues down the path of software evolution with some features sticking around and some not.

I love your biology writing but this analogy is... well near as I can see it is just flawed.

The evolution that matters in software isn't in the source code, it is in the ideas that the software express or perhaps in the languages and techniques for creating software. I've always like the term 'meme' and you can see that it is the idea that unifies "spreadsheets" or "webservers" or "photo editing tools", not the details such as the programing language used for implementation. across multiple project/products/companies source code is _not_ the equivalent of DNA. (the situation is different if you're talking about multiple versions of the same program, as the code base 'evolves' over time.)

So that's the software part about it, what about the open sourceness? Open source is about collaboration and pride and mutual benefit -- which of these concepts apply to transgenic transfer of DNA?

So the analogy is, at best, flawed.

(Fwiw I'm a developer on the open source sofware CruiseControl and a co-founder of the now defunct open source hosting company OpenAvenue.)

There�s a very cool science fiction book by Rudy Rucker called "The Hacker and the Ants" that explores this idea of genetic algorithms. A couple of hackers create a virtual ant colony that can evolve by itself. They set it loose in the virtual world and, later, it sets itself free in the real world creating all kinds of problems, both funny and terrorising. A very clever and amusing story

This analogy seems thin in some ways, but right on in others.

As a matter of fact, open-source programmers do tend to converge on "common codes" as you might call them. The Linux operating system is one (although there are others), the X Window System is another, and common libraries like Gtk (for user interfaces), Opengl (for 3d graphics), and SDL (for multimedia in games) are others.

However, the same is true of all software, not just open-source software. Even more common than any open-source code are the codes used in Microsoft Windows. Windows as a base, the Windows GDI (for drawing), and DirectX for everthing a game needs are extremely dominant and common.

I do appreciate the parallel between programmers sharing building blocks and organisms sharing DNA, but it's not an open-source thing. Any non-trivial program will make use of software libraries, which serve as their shared building blocks. DNA blocks as software libraries is a better analogy.

Still, I don't think there's a strong parallel between the development of software engineering practices and the evolution of early life. Software never went through a time of freely traded though ill-defined code snippets, and organisms never really worked together like open-source programmers. The comparison especially breaks down when you consider the fact that software is designed, while organisms are feeling through the dark, and reproducing when they don't fall in an unseen pit first.

By Paul Donnelly (not verified) on 24 Jul 2006 #permalink