Where the variation comes from.
Evolution proceeds by the action of many different evolutionary forces on heritable variation. Natural selection leads to the increase in frequency of variation that allows individuals to produce more offspring who, themselves, produce offspring. Genetic drift changes the frequency of variation through random sampling of individuals from one generation to the next. Population subdivision divides the variation into isolated groups where other forces (selection, drift, etc) act upon it. But where does all this variation come from?
Given the title of the post, the subtitle of the post, and your general understanding of biology, it should be pretty obvious that the variation comes from mutation. The purpose of this essay is to explain the different types of mutations that can contribute to heritable variation in populations. We will also explore how evolutionary forces act upon the different mutations.
Genomes are divided into chromosomes. Each chromosome contains a unique set of DNA sequences. Some organisms contain two copies of each chromosome (one from their mother and one from their father), and others only have one copy of each chromosome (inherited from a single parent). Each chromosome is made up of sequences that perform specific functions (some encode proteins, some determine when the protein coding sequences are expressed, and others encode other function elements) and non-functional sequences (junk DNA). We will not concern ourselves much with the specific categories the DNA sequences can fall into.
For the purpose of this treatment of mutations, we will divide the types of mutations into four classes:
Substitutions: changing the information in the genome.
Rearrangements: rearranging the information in the genome.
Insertions: increasing the amount of information in the genome.
Deletions: decreasing the amount of information in the genome.
Substitutions, also known as point mutations, result in the change of a nucleotide into another nucleotide. DNA sequences are made up of a series of four different nucleotides: adenine, thymine, guanine, and cytosine (symbolized by the letters A, T, G, and C, respectively). A genome sequence consists of an arrangement of these nucleotides in a specified order (think about it as a book written in a language consisting of four letters). If one of the nucleotides is changed, a substitution or point mutation has occurred.
The effects of a substitution depend on within which type of sequence the mutation occurs. If it occurs in a DNA sequence that lacks any function (so-called junk DNA), it will be neutral. Point mutations can also occur in sequences that encode proteins or sequences that regulate the expression of protein coding sequences. A substitution within a protein coding sequence may alter the protein that the sequence encodes. If so, the protein may be rendered nonfunctional, in which case the individual harboring that mutation will be less fit than other individuals. There is also the possibility that the mutation renders the individual more fit because the new protein sequence is better than the old one. Similar scenarios could be imagined for point mutations in other functional DNA sequences.
Genomic rearrangements include events such as fusion and fissions of chromosomes, inversions, and translocations (see figure below). The scale of these events can range from a region as small as a gene (a small part of a chromosome) to large portions of chromosomes to entire chromosomes. Fusion events, such as the one that occurred in the human genome after the divergence with chimpanzees, join together two complete chromosomes, whereas translocations occur when part of a chromosome is moved to another part of the same chromosome or to a different chromosome.
Inversions rearrange the genetic content within a single chromosome. The can also play an important role in speciation or contain alleles that confer fitness benefits. One reason is that inversions suppress recombination between different arrangements (chromosomes carrying different inversions do not easily exchange alleles). But it’s important to understand that rearrangements occur at a much lower frequency than point mutations. Comparing fitness effects of rearrangements and substitutions is a bit trickier, and we won’t deal with that here.
The next two classes of mutation result in the addition or loss of genetic material in the genome. Insertions increase the net content of a genome. There are multiple sources of the genetic material that enters the genome. It may arise de novo (synthesis of novel sequence), it may come from outside the genome, or it may be a duplicated copy of something from within the genome of interest. Insertions of novel sequence tend to be small (on the order of a few nucleotides), whereas insertions of material from outside or within the genome can be as large as a single gene or even multiple genes. We will focus on these types of insertions.
The ability of material from outside of a genome to insert into the genome in question (and our understanding of such events) depends on what types of sequence we are studying. In eukaryotes (plants, animals, fungi, and various microbes), viral sequences often move from individual to individual, inserting themselves into genomes. This horizontal transmission of genetic information is even more common in bacteria due to biological properties of these organisms. The uptake of genetic material from extragenomic sources have the potential to cause disease (ie, viruses) or may lead to novel genes which allow an organism to perform a new function (ie, bacteria picking up genes for antibiotic resistance from other individuals or from the environment).
Genomic information can also be duplicated within a genome. This can occur via various mechanisms, sometimes even aided by viral like sequences moving within a single genome. Entire blocks of genetic material can be duplicated via mechanisms that we’re still working to understand. Another common mechanism occurs when DNA sequences are transcribed to RNA, then reverse transcribed back into DNA and inserted back into the genome. Duplications allow genes or other DNA sequences to explore mutational space that would be inaccessible if they only existed in a single copy. That’s because many point mutations are deleterious, but if there is a copy of a sequence that maintains the original function, a duplicate copy can accumulate mutations that interfere with the original function. Many of these mutations will lead to a non-functional duplicate copy, but some may lead to a sequence with a new function that would not be possible with a single copy because the single copy must maintain the original function.
In addition to accumulate new content, a genome can also lose content. These events can occur on various scales, ranging from a few nucleotides to large chunks of chromosomes. Larger duplications tend to be more deleterious than smaller ones, but the quality of the content of the deletions affect of the fitness costs of the deletions as well. For example, the deletion of a small region containing an essential gene will be far more deleterious than a large deletion of non-functional sequence. Not all deletions will be deleterious, and it’s possible that they may even confer a fitness benefit if they delete a sequence that is deleterious. We can gain a fair bit of understanding of the evolutionary dynamics by exploring the frequencies of deleted sequences in natural populations. Three recent studies (reviewed here, here, and here) have performed such an analysis and found many common deletions within human populations.
The different types of mutations vary in the frequency at which they occur (point mutations are more common than rearrangements, insertion, and deletions), but there is also variation within the classes. For example, different sizes and types of rearrangements, insertions, and deletions occur at different frequencies. Additionally, certain substitutions are more frequent than others (see here for more details). And the fitness costs of these mutations depend on multiple factors, including the size of the events, in which types of sequence they occur or which sequences they contain, and what other mutations are associated with the mutations.