Data & theory, then, now, and forever

In the 10 Questions for A.W.F. Edwards, a mathematical geneticist, he was asked:

Like Fisher you have worked in both statistics and genetics. How do you see the relationship between them, both in your own work and more generally?

Edwards responded in part:

Genetical statistics has changed fundamentally too: our problem was the paucity of data, especially for man, leading to an emphasis on elucidating correct principles of statistical inference. Modern practitioners have too much data and are engaged in a theory-free reduction of it under the neologism 'bioinformatics'.

This elicited a strong response from 'godless capitalist,' a computational biologist himself:

In other words, they did a lot of math that was unconnected to reality, aka "it is a capital mistake to theorize in the absence of data". You can see the results in the pages of the journal Genetics today, or in something like Gillespie's book -- written in 2004! -- which doesn't even mention genome sequencing.

This issue re: theorizing in the absence of data is particularly salient in population genetics, where basic phenomena like recombination (and its impact on evolution) could not be well modeled because of the sheer extent of fine-scale recombination variation -- an extent which has only recently been apprehended and quantified.

This reminds me of what Richard Lewontin stated in 1974 about the evolutionary genetics which Fisher, Wright and Haldane created in the 1930s:

...rich and powerful theory with virtually no suitable facts on which to operate. It was like a complex and exquisite machine, designed to process a raw material that no one had succeeded in mining.

Finally, in my 10 questions with him, famed evolutionary geneticist James F. Crow stated:

It is true that the elegant theory of Fisher, Wright, Haldane, Kimura, and Malécot was less useful than might have been expected, because of lack of good data to whieh the theory was applicable. But that is no longer true. Molecular evolution has provided an abundance of data and the theory now has plenty of important applications. In particular, the neutral theory of molecular evolution has had great heuristic and predictive value, and it owes a great deal to Kimura's earlier theoretical work, which built on the foundations of the pioneers. Lynn might change her mind if she looked at some of the striking results gotten by combining molecular measurements with population genetics theory.

Whatever the details, one thing seems clear: Fisher, Wright and Haldane, and their successors, generated a theoretical system in advance of the ability to test all their conjectures or inferences. Is this useful? All things in moderation! Science is haphazard, some might call it a memetic form of stochastic hill climbing. Of course, if theory outruns data too much then you might get stuck in a ravine with steep sides and never climb back out. Before Origin the science of biology was one of discovery and classification. Darwin's theory of evolution gave it is a paradigmatic lens through which to comprehend the diversity of life. But Darwin himself ran ahead of data: he had no good mechanism of genetic transmission! With the rise of Mendelianism this gap was closed to some extent, finally sealed tightly by Watson and Crick's exposition of the structure of DNA. And yet just as Mendelianism put Darwinian evolutionary theory on firmer ground, R.A. Fisher and Sewall Wright began to force the theoretical territory far ahead of what the data could arbitrate. If the data was available I doubt that the "Wright-Fisher controversies" would have been as heated. What is the role of gene-gene interactions? Population substructure? Effective population size? After the basics of the Wright-Fisher models were elucidated the rest was rhetorical shadow boxing over fine axiomatic points which resulted in wildly variant inferences. Will Provine has argued that Wright's central contentions in contrast with Fisher were misunderstand by his acolytes, suggesting that the controversy did lead science into ravines. But that is the nature of science, theory runs ahead of data and transforms into ideology, data smashes ideology and reshapes it into a theory, which gives rise to new systematic structures and paradigms. Schumpeter would be proud!

Tags

More like this