Gene Expression

Every ratio 3:1!!!

Science isn’t perfect, it often misses obvious truths. Consider the 2005 Nobel in medicine, awarded for the work of Barry Marshall and J. Robin Warren in establishing the connection between Helicobacter pylori and ulcers. After the fact you hear many stories of doctors who had stumbled onto the solution, antibiotics, long before the scientific consensus. Many others now understood why they always saw these pathogens in samples taken from patients with ulcers. Now it all makes sense, but these sort of screw ups make you wonder how far we’ve gone past Galen! Falsification is a decent formalization of the scientific process if you distill it down to its bare essentials, but it ignores the reality that science is executed by people, not computers. Thomas Kuhn’s work in The Structure of Scientific Revolutions speaks to that sociological reality, instead of a gleaming geometrical crystal city, natural science is filled with booming unplanned towns, citadels being swarmed by unexepected squatters, and castles in the hinterlands striving in vain to maintain their relevance. Even mathematics, that most rational of disciplines, is driven by an engine of intuitive insight and gestalt understanding, no matter the clean final product carved from axioms. Alas, science has a low signal to noise ratio, but paraphrasing Winston Churchill, it’s the best system we’ve got.

Of course, because of the socially contextual nature of much of science there is a niche for historians and sociologists to study it as a subculture. It is on the great mound of noise in which the signal swims that Will Provine has established his career as the historian of evolutionary genetics. His biography of the American population geneticist Sewall Wright displayed not only an encyclopedic knowledge of the personalities who touched Wright’s life, but the technical details of the theoretical biology which served as his legacy. It was with an understanding of this background that I came to Provine’s The Origins of Theoretical Population Genetics.

Basically a slim elaboration on his Ph.D. thesis at the University of Chicago this text explores the social and scientific dynamics between the initial high tide of the Darwinian phase in evolutionary theory and the reemergence of its primacy during the 1920s as population genetics fused the Mendelian framework with the wealth of statistical tools that were found in the biometrical school. In the interregnum Darwin’s original ideas which emphasized the importance of natural selection on continuous variation as the primary motive force for evolutionary change were relegated to the margins. A thorough survey of this period can be found in Peter J. Bowler’s The Eclipse of Darwinism, but Provine’s work is more narrowly focused, and tends to put the spotlight upon individuals rather than grand social movements. The importance of personality in inflating semantic confusions and mediating sociological dynamics shows exactly where much of the noise in the scientific system comes from.

In short Provine’s thesis centers around the conflict between the Mendelians, led by William Bateson, and the biometricians, headed by Karl Pearson (the Pearson’s correlation coefficient), and the subsequent fusion which culminated in R.A. Fisher’s 1918 paper, The Correlation between relatives on the Supposition of Mendelian Inheritance. The conflict between these two groups was in part on genuine scientific grounds, but Provine makes it clear that personal animosity, turf wars and inability to master the methodologies of the other side perpetuated a discord which was really much ado about nothing (and resulted in far less getting done).

The dispute had its seeds in the somewhat confused ideas of Francis Galton in the field of evolution. Unlike his cousin Charles Darwin and Alfred Russel Wallace Galton did not believe that natural selection upon continuous variation within populations was sufficient to explain evolutionary change. Like many scientists, including Thomas Huxley, Galton contended that evolution was due to the emergence of unique mutant forms, “sports,” which were at sharp discontinuity with the normal variation within a population. Galton did not accept that selection upon continuous variation would induce evolutionary change because he had some peculiar ideas in regards to regression toward the population mean. He seemed to posit some sort of innate stabilizing factor within a population which kept it around a species typical mean, bounded by its range and characterized by a particular variance. So individuals at the extremes would give rise to offspring who would regress back toward the mean of the population. Mutant varieties on the other hand might offer the opportunity to break out of this tendency by generating de novo a new central tendency. Pearson, Galton’s protege, pointed out that he neglected to consider that repeated generations of assortative, or selective, mating of exceptional individuals would avoid the problem of regression back toward the ancestral mean as “mediocrity” (that is, random mating of exceptional individuals with less than exceptional ones) would not dilute the offspring and successive population means would be established.1

But Galton’s specific issue with natural selection was only a small part of the puzzle. The bigger problem was that the dominant mode of thinking in regards to inheritance of traits from parent to offspring in Darwin’s day was blending, that is, the characters of parents are synthesized in a fashion where the offspring are a byproduct which reflects a mix of both parents. But there is a problem here: this process exhausts the variation that Darwin envisages natural selection uses to drive evolution (natural selection resulting in differential fitness of individuals correlated with variation on heritable traits). Certainly there are ways to get around this problem, but the hand waving explanations proffered by Darwin seemed to lack the power to dodge the homogenizing tendency of blending inheritance. Blending inheritance is intuitive, I recall reading in science fiction many times about a distant future where all people are golden-brown, the natural range of human coloration supposedly erased by admixture. Unfortunately intuition is not always a good guide to how the world works, and in the case of inheritance it is wrong, Mendel was right, discrete transmission of traits allows variation to avoid obliteration in the process of sexual reproduction.

Of course it is famously known that Mendel’s paper of 1866, the solution to the nagging problem of the day in evolutionary biology, was roundly ignored. By 1900 his work seemed more propitious and suitable for the times, as multiple researchers rediscovered it simultaneously. Unfortunately the scientific landscape had changed in a negative fashion as well, and the Mendelian hypothesis became associated with macromutationists who generally rejected Darwinian natural selection as having any relevance for evolution. Additionally, in the interim between Mendel’s original work and 1900 a very peculiar thing happened to Francis Galton, his followers rejected his discontinuous model of evolution in favor of Darwinian gradualism, and attributed to him the foundation of a school of biology, biometrics, which would bear the selectionist banner for a generation. Much of Provine’s book details the bizarre tendrils of association between Galton, his protege Pearson and his intellectual fellow traveler in regards to evolution, William Bateson. Bateson agreed with Galton about the process that drove evolution, but Pearson was Galton’s intellectual heir and the former claimed that the latter was the precursor for the school of biology which opposed Bateson and the Mendelians and mutationists. Add to this the complexity of the fact that Galton posited a Law of ancestral heredity which was radically reworked by Pearson. The law kept its name and association with Galton and served as the linchpin of of anti-Mendelian thought (Provine devotes an appendix to this Law, but he also admits in the text it was quickly thrown into the dust-bin of history once Mendelianism arrived).

I suspect that I’ve lost many people with the minutiae of details at this point, but I wanted to highlight how Byzantine and nonsensical the personal relationships were, and how they had a secondary effect on the march of ideas and the progress of science. By the first decade of the 20th century Bateson had assembled enough data to argue for the existence of Mendelian transmission of traits, but the biometricians dismissed Mendelianism as a trivial phenomenon. Mutationists around Hugo de Vries had connected their own ideas to Mendelianism, so the rise of both models tracked each other, and both took advantage of the eclipse of Darwinism because of the a priori problems relating to blending inheritance. Add to that that Wilhelm Johnnsen’s research with pure lines, genetically uniform lineages, showed that variation was not heritable, and the muddle was complete. If there is no heritable variation for selection to act upon, then selection has no power to affect change, Q.E.D.

In the midst of this there were lone voices in the wilderness. For example, George Udny Yule presaged many of the obvious implications of both continuous traits and Mendelianism, and showed that they could be reconciled with ease. When the mathematically fluent Karl Pearson took up the attack against Mendelianism, Bateson had to rely on Yule to respond because he himself was notoriously mathematically inept.2 Nevertheless the rational arguments of Yule were far less impactful than those of the empiricists who showed that selection did have power. William E. Castle’s work with mice, Thomas Hunt Morgan’s groundbreaking research with Drosophila and H. Nilsson-Elhe’s experiments on seed color all showed that long term selection could change the character of a population.

By 1918, when R.A. Fisher published his ground breaking paper the world was ready, and science finally caught up with its lying eyes. I’ve read Fisher’s paper three times, and it is pretty hard to follow if you don’t have a background in statistics (the math isn’t technically hard, but the prose is difficult to follow and the mathematical logic makes leaps far too often for comfort). It is one of those “and it naturally follows” works where the author assumes that you can track logic across the empty chasms of unexplicated algebraic manipulations and derivations. The gist of Fisher’s paper is that the empirical data that the biometricians processed with their finely tuned statistical techniques was totally explicable assuming a larger number of Mendelian loci a priori. For example, the variance between siblings on traits, like height, which are extremely heritable (assume a reasonable nutritional background) is natural if you consider that the parents are highly heterozygous. Fisher proceeds to use his model to analyze data from the time period, and he begins down the road of using variance partitioning to ascertain the various factors which result in a statistical distribution.3

But really the convergence of Mendelianism and continuous traits do not necessitate immersing yourself in familial correlations and moments of a probability distribution. I mean, after all, for a large number of trials the binomial distribution approximates the normal. No seriously, look at these images:
i-aca3abb62dc86e6f228352ef3028ba27-n3.jpgi-5e32b35f170eeac3f431f481025f784c-n6.jpgi-01f64cab0c607b2da20709ec43b6a34c-n12.jpgi-5039d7df7b8fa270ca2d459e339d2fea-n24.jpgi-cf48436baf8b82e8a918cf29fffe13fa-n48.jpg

I just took screen captures after slotting in different N values into this applet (N=3, 6, 12, 24, 48). As N get’s large the normal approximation gets better.

To translate this into a more genetical sense, imagine you have a locus, a position in the genome, that controls density of hair follicles on your skin. If the gene is on, it is O, and if it is off, it is o. Assmume that the organism is diploid, so you have two copies of a given gene. So,

oo = hairless
Oo = hairy
OO = hairy

So O is dominant, right? But what if the density of hair is proportional to the number of O’s you have? Then you have:

oo = hairless
Oo = hairy
OO = hairiest

This is an additive situation . You now have three discrete phenotypes. Now, imagine you had a second locus. Let’s use a different letter, but the casing convention remains the same. Also, let’s assume that how hairy you are is proportional to upper case letters, “on” genetic variants (alleles). So, imagine:

oo, pp = hairless
OO, PP = hairiest

But there are many combinations in between, for example, the median hairness can be expressed by any of these combinations:

OOpp
ooPP
OoPp
OopP
oOpP
oOPp

In an individual you have 4 slots which can be filled. If a slot is filled with a upper case letter, that’s a “success,” if not, that’s not a success. So, N = 4. For the binomial distribution the expectation of successes, the mean, is Np, where p is the probability. Like a coin flip let’s set it to 0.5. Then you get an expectation of 2. But there is also a variance, Npq, where q = 1 – p. So the expected variance is 0.5 units. Obviously the discrete nature of the genetic system means that the variance will exhibit itself in a stepwise fashion. As we move up the number of trials the discrete steps become finer and finer in relation to the range of the number of successes.

I’m sure most of you can connect the dots at this point. It is transparently clear how discrete Mendelian genetics can give rise to continuous traits. Just as discrete probability distribution functions approximate continous functions given enough trials, so as the number of loci increases a normal distribution, the bell curve, is attained. To put it another way, consider the central limit theorem which states that as the number of random variables increases you will approach the normal distribution. Replace random variable with loci and it is intelligible in a biological sense. Part of R.A. Fisher’s work was elucidating exactly how variance manifested itself within biometrical data sets so that dominance, epistatic and environmental effects could be set aside and the additive genetic variability, which is the raw material for natural selection, isolated.

Now that you see how simple it all really is,4 why didn’t anyone get this? As I noted above George Yule did point some of these issues out to both Pearson and Bateson. Bateson really couldn’t follow the mathematics, but Provine makes it clear that he was somewhat leary of the macromutationists, and the more credulous of the Mendelians. For example, biologist C.C. Hurst was notorious for seeing the fabled Mendelian 3:1 dominant:recessive ratio in every data set he examined, and Bateson was reluctant to publish several of his papers because of this particular weakness.5 As for Pearson, he was clearly a brilliant thinker and on some level I can not believe he did not conceive of what Yule was getting at, but he seems to have been blinded by the overemphasis of the Mendelians in regards to discontinuous evolution. Prior to the selection experiments I alluded to there was still room for most biologists to reject the biometricians mathematically opaque arguments in part because they were difficult to follow and there was no compelling reason to accept them. By the time Fisher published his paper, and Sewall Wright and J.B.S. Haldane following in his wake, the theoretical arguements were necessary to augment the empirical findings. Science does work, but sometimes slowly. From my reading it seems clear that most biologists had a sketchy grasp of the mathematical logic of the theoretical geneticists, but they found their conclusions plausible. Wright’s emphasis on population substructure and genetic interactions appealed to both Theodosius Dobzhansky and Ernst Mayr, the experimentalist and naturalist who both were aware of the role that populational fragmentation could play in reshaping and skewing population dynamics. Fisher helped inspire the school of ecological genetics at Oxford run by E.B. Ford.

Unfortunately personalities got in the way so that the natural confluence of biometrics and Mendelianism was arrested for nearly two decades by a combination of personal squabbles, misunderstandings and inopportune sociological factors (e.g. the attraction of the macromutationists to Mendelianism seem to have repelled the biometricians from any repproachment). Sometimes arguments, even if they are personally colored, are based on issues that are naturally intractable. Fisher and Wright’s debates on the role of genetic interactions and population substructure in evolutionary genetics simply were not resolvable in their day. This does not seem to be the case with the Mendelian-biometrical controversy, all the scientific pieces were there to be fitted together by 1900, but the social matrix wasn’t ready. Lone voices like George Yule stated what they saw plainly, and attempted to mediate between the two warring camps, but they might as well have been pissing into the wind. When Fisher attempted to have his paper published in Biometrika, Pearson simply stated that he did not read any papers by Mendelians, and that was that.

One of the definitions for sin is to miss the mark. Some might be disturbed by the religious analogy, but I think that the passion, dedication and monomaniacal impulses that drive many scientists do resemble behavorial patterns of recluses, mystics and monastics. Much of science is a priesthood unified by special knowledge of a elite mathematical language. This elect has open admissions, but you must forgo many things and sacrifice many years before you are allowed into the polis of science as a full citizen. Just as sin derives from human frailties, so corruption, ignorance and obfuscation rear their ugly heads in the scientific disciplines because of the nature of man. Just as high priests can twist the words of their predecessors to fit any situation or context, Karl Pearson and the biometricians boldly declared themselves to be disciples of Francis Galton though they explicitly rejected his opinions in regards to the motive force behind evolution. William Bateson may not have comprehended the mathematical wizardy of Pearson with any level of understanding, but he had faith in his experiments and the system of Medelian genetics, next to which “Galton’s” Law of Ancestral Heredity was thin gruel indeed.

Science is a culture. Whatever philosopher of science you bend the knee to, whether it be Popper, or Imre Lakatos or Paul Feyerabend, know that idealized systems rarely find perfect execution in the world of man. Yet though Kuhn was likely right about his idea of periods of stasis between short bursts of paradigm shifts, some of his interpreters confuse the difficulty of knowledge acquisition with the impossibility of knowledge acquisition. Just because the signal to noise ratio is low does not mean that a detectable signal isn’t calling out in the din, attempting to make its voice heard. When I reflect on the conflict between the Mendelians and the biometricians I am appalled and sickened, so many years wasted, so many scientists who allowed their days to wallow in its ignominious acrimony. Walter Weldon, Pearson’s right-hand man, spent the last days of his life going through horse breeding record books to disprove some of C.C. Hurst’s Mendelian enthusiasms. And yet the reality is that the data drove the science before it, and egos were eventually crushed and Truth as we know it won the day. Eventually R.A. Fisher’s stature rose and Karl Pearson became a shadow at his own university. The work of Fisher, Wright and Haldane, set the stage for the culmination of the Neo-Darwinian Synthesis. Darwinism and evolution became so synonymous once more that those who work within evolution do not call themselves “Darwinists” because that would seem redundant, we are all Darwinists today.

1 – This can be illustrated by the simple prediction equation (“breeder’s equation”), R = h2S, where h2 is heritability, the proportion of phenotypic variation attributable to additive genetic variance, and S is selective differential of the breeding population from the total original population. If the h2 is 0.5 then the offspring of parents deviated any unit would regress halfway back to the previous generation’s mean, but each successive generation would have a higher mean. This process has resulted in populations in artificial breeding experiments with means many standard deviations beyond the the wild type mean.

2 – Bateson received extensive tutoring to pass his mathematics entrance examination requirements. Though he was correct in many of his scientific instincts in regards to the validity of Mendelian genetics, it seems clear that he also did not understand the arguments made by his antagonists because he lacked the mathematical skill.

3 – ANOVA is due to Fisher, ergo, AMOVA too.

4 – Even if the further exploration of the mathematics can get gnarly, the basic concept is really simple.

5 – The 3:1 ratio only occurs in a situation where the alleles are both at a frequency of 50%. Otherwise, the ratio is governed by the famous Hardy-Weinberg equation, p2 + 2 pq + q2, where the first and last elemetns represent the homozygote genotype frequencies and the middle element the heterozygote.

References:

The Origins of Theoretical Population Genetics, Will Provine, 2nd edition 2001

The Correlation between Relatives on the Supposition of Mendelian Inheritance, Transactions of the Royal Society of Edinburgh, R.A. Fisher, 1918

Principles of Population Genetics, Daniel Hartl and Andrew Clark, 3rd edition 1997

Comments

  1. #1 Steve Sailer
    January 28, 2006

    Razib writes:

    “[Galton] seemed to posit some sort of innate stabilizing factor within a population which kept it around a species typical mean, bounded by its range and characterized by a particular variance.”

    Reading N.W. Gillham’s biography of Galton, I was struck by how reminiscent Galton’s 19th Century thinking on the stability of species was to Stephen Jay Gould’s famous theory of “punctuated equilibria.”

  2. #2 Steve Sailer
    January 28, 2006

    This history reminds me of my seven-year-long argument with U. of Chicago economist Steven D. Levitt of “Freakonomics” fame over his theory that legalizing abortion cut the crime rate dramatically in America.

    Since 1999, Levitt has been playing the Karl Pearson role of the mathematically sophisticated insider by waving away my empirical criticism of his theory by claiming that his complex econometric modeling of state level abortion and crime data proves its validity. And I’ve been playing the William Bateson role of the mathematically simple-minded outsider who keeps pointing out that if you look at the national level data, the opposite of Levitt’s theory actually happened — the first cohort born after the legalization of abortion had triple the teen homicide rate of the last cohort born before legalization.

    Well, when two econometrician finally went through Levitt’s model in detail last year, it turned out that he had two technical errors that were fatal to his theory. Of course, Levitt’s theory remains the new conventional wisdom …

  3. #3 razib
    January 28, 2006

    I was struck by how reminiscent Galton’s 19th Century thinking on the stability of species was to Stephen Jay Gould’s famous theory of “punctuated equilibria.”

    yes, you aren’t the first.

  4. #4 Mark
    January 28, 2006

    Fascinating. Readers may also enjoy:
    Evolution by Jumps: Francis Galton and William Bateson and the Mechanism of Evolutionary Change, by Nicholas W. Gillham (Duke University)
    http://www.genetics.org/cgi/content/full/159/4/1383

    On “evolvability” and “facilitated variation”: New book, The Plausibility of Life: Resolving Darwin’s Dilemma by Marc W. Kirschner, John C. Gerhart (Yale UP October 2005). For review and discussion of “facilitated variation”:
    http://www.harvardmagazine.com/on-line/110512.html

    “Evolvability” Gerhart + Kirschner 1998
    http://www.pnas.org/cgi/content/abstract/95/15/8420

    On genetic robustness, environmental robustness, canalization, epistasis, evolvability, etc., see:
    Mutational Robustness, Modularity and Evolvability:
    Walter Fontana, Andreas Wagner.
    http://www.santafe.edu/sfi/research/focus/robustness/projects/mutationalRobustness.html

    Continuity in Evolution: On the Nature of Transitions: Fontana + Schuster
    http://www.santafe.edu/~walter/Papers/science1.pdf

    On “sin” and scientific knowledge, see “Saintly Resonances,” review by science historian Lorraine Daston of Dying to Know: Scientific Epistemology and Narrative in Victorian England by George Levine
    http://www.lrb.co.uk/v24/n21/dast01_.html

  5. #5 Amit
    January 28, 2006

    Nice write up, Razib. I attended a seminar given by Will Provine recently. He actually brought in some of Sewall Wright’s original lab notebooks!

  6. #6 Matt McIntosh
    January 28, 2006

    Great post. Regarding the whole Popper/Kuhn thing I just see them as operating at different levels. Popper sketches out the logical/epsitemic skeleton, Kuhn (and Polanyi) are more interested in describing the sociological/psychological meat. (Lakatos is mostly just Popper with a few questionable tweaks, and Feyerabend is the anti-philosopher who relishes in demolishing standards and leaving only anarchy in his wake.) I see these two angles as complements rather than substitutes when viewed properly, and find the messy sociological aspects of scientific inquiry just as interesting as the cleaner epistemic structure of it. Hence, Provine is now appended to my ever-growing reading list…

  7. #7 Steve Sailer
    January 28, 2006

    1. I believe that Bateson suggested the correct answer — that Mendelian genetics worked if you assume a lot of different genes — quite early in the controversy, but it somehow got dropped.

    2. The biometric approach is analog, the Mendelian approach is digitial. Most things in the real world look analog, and it was quite difficult for scientists in the first half of the 20th Century to realize that the seemingly smooth curves of analog reality can be based on a microscopic digitial reality. So, we should have some sympathy for those who couldn’t make the leap.

    3. By analogy, Pearson’s failure is somewhat reminiscent of Einstein’s distaste for quantum mechanics.

    4. At a macro level, the real world is often more like the statistical vision of Pearson than that of the earliest Mendelians. For example, the only thing most college graduates remember about genetics is the blue eye-brown eye model of dominant and recessive genes, which ill-equips them for thinking about more common issues in real world genetics, where a probabilistic perspective is more realistic.

  8. #8 David Boxenhorn
    January 29, 2006

    You don’t need a lot of variables to get a normal distribution, just a few variables and a little noise.

  9. #9 razib
    January 29, 2006

    You don’t need a lot of variables to get a normal distribution

    i have read in the pop genetics lit that anything with 4 or more loci is operationally normal and polygenic in terms of many experimental contexts. the power is simply too weak to detect even such large (presumed) discrete transitions.

  10. #10 Matthew Cromer
    January 30, 2006

    The great scientific sin is that we insist on certainty. Therefore we make our models more and more rigid in our minds, and throw out ideas and facts that do not fit within them. Lacunae develop in our vision of the world. And so we miss the importance of Helicobacter pylori, we miss the fact that moving plates generate mountains, we miss the fact that rocks really do fall from the sky. Because they just don’t fit in.

    The solution is a new kind of skepticism: skepticism towards our own models of reality. And the more ingrained and unquestioned the assumptions of that model are, the more skepticism we will need to muster about them.

    I like Rupert Sheldrake’s formulation of this:

    “I am skeptical of people who believe they know what is possible and what is not. This belief leads to dogmatism, and to the dismissal of ideas and evidence that do not fit in. Genuine skepticism involves an attitude of open-minded enquiry into what we do not understand, and this is the approach I try to follow.”

    Sheldrake was, of course, a full-fledged cleric of the scientific priesthood with publication credits for properly reductionistic chemo-morphological research in Nature, Science News and scores of more specialized journals. But his theory explaining how wholes (atoms, molecules, cells, organisms) were more than the sum of their constituent parts led to a sentence of anathema and a judgement that his book deserved burning. Was that condemnation an example of science working correctly, or did the collective enterprise of science just ignore the cure for ulcers yet again?

    Here is the collection of Rupert Sheldrake’s published research, both on conventional and more controversial topics:

    http://www.sheldrake.org/papers/

The site is currently under maintenance and will be back shortly. New comments have been disabled during this time, please check back soon.