I want to bring your attention to a somewhat dense and possibly inconclusive (but important) paper accompanied by a very informative overview in PLoS Biology, concerning mutations in the human genome.
Mutation rates and patterns of mutation are important for a number of reasons. For one thing, the genome itself is a data set that is both broad and deep. There is a lot of information in a given individual genome (a haploid set of genes from a person, for instance) but there is a wide range of variation in that information. So, inferences or assertions regarding the nature and distribution of genes or their variants cannot really refer to a single version of the genome, but must also take into account the variation in DNA sequences.
A very obvious area where variation is important is in reconstructing phylogenies. “Family trees” of populations or species can be reconstructed by estimating the genetic difference between pairs of samples, and from this, estimating the amount of time that has passed between a Last Common Ancestor and each of two later populations. These dyads (or triads, depending on how you count them) can then be pieced together to get a phylogeny … a graph representing the historical divergence of populations or species … that tells us a particular version of history. Obviously, the rate of mutation must be known or assumed to make this work. Variation in mutation across the genome, or across a population, or across the structure of the family tree itself will cause incorrect inferences.
The research paper is “Cryptic Variation in the Human Mutation Rate” by Hodgkinson et al. Here’s the key finding:
The mutation rate is known to vary between adjacent sites within the human genome as a consequence of context, the most well-studied example being the influence of CpG dinucelotides.
… a cytosine followed by a guanine is 10 times more likely to mutate than a cytosine not followed by a guanine.
We investigated whether there is additional variation by testing whether there is an excess of sites at which both humans and chimpanzees have a single-nucleotide polymorphism (SNP). We found a highly significant excess of such sites, and we demonstrated that this excess is not due to neighbouring nucleotide effects, ancestral polymorphism, or natural selection. We therefore infer that there is cryptic variation in the mutation rate.
“Cryptic” means “We dunno.”
However, although this variation in the mutation rate is not associated with the adjacent nucleotides, we show that there are highly nonrandom patterns of nucleotides that extend ∼80 base pairs on either side of sites with coincident SNPs, suggesting that there are extensive and complex context effects. Finally, we estimate the level of variation needed to produce the excess of coincident SNPs and show that there is a similar, or higher, level of variation in the mutation rate associated with this cryptic process than there is associated with adjacent nucleotides, including the CpG effect. We conclude that there is substantial variation in the mutation that has, until now, been hidden from view.
You can go and read this paper if you want, but I recommend that as an alternative you read the much more accessible and meaningful to the average person overview by Laurent Duret “Mutation Patterns in the Human Genome: More Variable Than Expected”
Alan Hodgkinson, Emmanuel Ladoukakis, Adam Eyre-Walker (2009). Cryptic Variation in the Human Mutation Rate PLoS Biology, 7 (2) DOI: 10.1371/journal.pbio.1000027
Laurent Duret (2009). Mutation Patterns in the Human Genome: More Variable Than Expected PLoS Biology, 7 (2) DOI: 10.1371/journal.pbio.1000028