Tandem repeats and morphological variation

Blogging on Peer-Reviewed Research
i-ccbc028bf567ec6e49f3b515a2c4c149-old_pharyngula.gif

All of us mammals have pretty much the same set of genes, yet obviously there have to be some significant differences to differentiate a man from a mouse. What we currently think is a major source of morphological diversity is in the cis regulatory regions; that is, stretches of DNA outside the actual coding region of the gene that are responsible for switching the gene on and off. We might all have hair, but where we differ is when and where mice and men grow it on their bodies, and that is under the control of these regulatory elements.

A new paper by Fondon and Garner suggests that there is another source of variation between individuals: tandem repeats. Tandem repeats are short lengths of DNA that are repeated multiple times within a gene, anywhere from a handful of copies to more than a hundred. They are also called VNTRs, or variable number tandem repeats, because different individuals within a population may have different numbers of repeats. These VNTRs are relatively easy to detect with molecular tools, and we know that populations (humans included) may carry a large reservoir of different numbers of repeats, but what exactly the differences do has never been clear. One person might carry 3 tandem repeats in a particular gene, while her neighbor might bear 15, with no obvious differences between them that can be traced to that particular gene. So the question is what, if anything, does having a different number of tandem repeats do to an organism?

Fondon and Garner address this question by first looking for populations that exhibit large and obvious morphological differences between individuals, and then looking within their genome to see if those differences can be correlated with the number of tandem repeats present. The population they are investigating are domestic dogs. Dogs are not only diverse, but dog breeders are notoriously picky about shape and character, and purebred dogs have been under intense selection for specific attributes. Once a range of morphologies in a particular character, such as the shape of the snout, have been identified, one can ask whether that trait is reflected in the number of repeats in any genes.

The authors examined 142 dogs from 92 different breeds, and looked at 37 different tandem repeats in 17 genes in each. The genes selected were developmentally significant transcription factors that were at least suspected of playing a role in the formation of specific morphologies. 15 of the 17 genes turned out to have multiple alleles varying in the length of their repeats.

That there would be this substantial amount of genetic variation in tandem repeat number isn't at all surprising. Tandem repeats are subject to very high mutation rates, up to 100,000 times greater probability than a point mutation, because they are prone to a kind of error called slipped-strand mispairing. Because they contain many copies of the same short sequence over and over, it is easy for the two strands of DNA to get misaligned in this local region—the GTAC on one strand could base-pair with the first CATG in the other strand, or the second, or the third. If the strands are mispaired, the replicating enzymes can err and either clip off some of the repeats, or add extra repeats. It's a special kind of error, in that the DNA changes aren't to random nucleotides, but instead produce only different numbers of repeats.

Note that this lack of fidelity in copying tandem repeats means that they are only going to be found in regions of genes that can tolerate some variability in the length of the resulting protein. That's interesting in itself, since it says these proteins are capable of functioning with ±30 or more amino acids in their final length.

Also, slipped-strand mispairing can be foiled by point mutations, even to synonymous codons, within the tandem repeat. A small change in the sequence gives the replication machinery a local difference that is used to properly align the two strands, and a stable tandem repeat will accumulate these small changes and lose its repeated character. On the other hand, a deletion caused by slipped-strand mispairing can remove the point difference, and subsequent mispairing can then expand the sequence, producing a repeat free of imperfections. One measure of how much selection for variation has been going on within a tandem repeat is its purity: if there are few interruptions in the perfection of the repeat, there has been much deletion and expansion going on within the sequence in its history. If there are multiple deviations from perfect repetition, then the sequence has not undergone much length variation in the recent past.

The purity of the sequence is therefore a measure of how much selection for new variants has been going on in the lineage. The authors compared the same repeat loci in humans and dogs, and found that dog repeats were purer in 29 of 36 cases, and of the same purity in 7 cases. This strongly suggests that the variations in dogs aren't just random, neutral changes, but are the outcome of recent selection at these loci.

OK, already, so there are these interesting kinds of gene variants in dogs, and they have apparently undergone selection. What effect do the repeats have?

I'll describe the two main examples from the paper. The first is a gene called Runx-2 (runt-related transcription factor 2), which is related to the Drosophila pair-rule gene (a gene that is involved in segmentation), runt. In vertebrates, one of the functions of Runx-2 is to regulate the differentiation of osteoblasts, the cells responsible for laying down bone. Runx-2 contains two repeats, one coding for 18-20 glutamines (the poly-Q region), and another coding for 12-17 alanines (the poly-A region). A statistical comparison of the total repeat length (Q+A) with various parameters of the skull size revealed a correlation with the midface length, and a property called clinorhynchy, or dorsoventral nose bend. What's clinorhynchy? If you've seen a bull terrier, you know what's distinctive about them: that long nose with a downward droop.

i-4f50b826947b1e16e3b9fd4a87823a85-bull_terrier.jpg
Bull terrier

Bull terriers tend to have a short pair of tandem repeats, and they have long midfaces and pronounced downturn of the snout. They have been intentionally selected for this, and museum specimens over the last 70 years show increased prominence of this feature.

i-4f43dc30f7be8aaf4671392836fbfe40-clinorhynchy_bull_terrier.jpg
Rapid and sustained evolution of breeds. Purebred bull terrier skulls from 1931 (Top), 1950 (Middle), and 1976 (Bottom). Despite the lack of genetic diversity caused by population structure and history, these breeds are able to continually create new and more extreme morphological variations at a rapid and sustained pace. Analysis of the Runx-2 repeats in the 1931 bull terrier reveals a more intermediate allele (Q19A14) than is present in the modern bull terrier (Q19A13).
The original figure with additional skulls from St. Bernards and Newfoundlands is here.

This is cool stuff so far, but I have to tell you, it gets a little more complicated. It's not as simple as short repeat length→downturned snout. One of the ways transcription factor activity is regulated is by binding to one another; chains of amino acids can affect how the transcription factors interact. It turns out that polyglutamine can increase the rate of transcription, while polyalanine reduces it, and the Runx-2 protein has both a polyglutamine (poly-Q) and polyalanine (poly-A) chain. What might matter more in a situation where two competing components modulate activity is the ratio of poly-Q to poly-A, and lo, the poly-Q/poly-A ratio shows an even stronger correlation with clinorhynchy than does poly-Q+poly-A.

i-e5bac300474721c5fd6a4663ff5ac591-clinorhynchy_repeats.gif
Tandem repeat length in a developmental gene is quantitatively correlated with continuous morphological features. (A and B) Reported effects on transcription of polyglutamine and polyalanine repeats suggested that these two domains may be involved in competitive activities and that the relative lengths of these domains may be more instructive than their aggregate length. A Pearson correlation test of this hypothesis revealed a significant correlation between Runx-2 polyglutamine to polyalanine ratio and clinorhynchy (D/V nose bend, P = 0.0001, Pearson one-sided significance, n = 27, A) and midface length (P = 0.0002, n = 27, B). The nature and direction of these correlations is indicative of longer relative Runx-2 glutamine repeats resulting in increased midface growth, consistent with observations from human cleidocranial dysplasia patients. Published studies indicate that amino acid repeat length-function relationships are typically nonlinear; however, fitting a quadratic or exponential to the clinorhynchy data (A) does not provide sufficient improvement in residuals to support the use of a nonlinear function over a simple line.

The second gene example is Alx-4 (aristaless-like homeobox 4). Alx-4 is also related to a transcription factor found in Drosophila, and knocking out the gene in mice produces six-toed mice. One specific allele of this gene, Alx-4Δ51, was found in only one breed of dog, the Great Pyrenees. One peculiarity of this breed is hindlimb polydactyly—purebreds are supposed to have a double dewclaw, for a total of six digits on the hindleg. The Alx-4Δ51 is a deletion, which knocks out 51 nucleotides from the tandem repeat, for a loss of 17 amino acids. All of the Great Pyrenees with polydactyly have this 17aa deletion; one Great Pyrenee without polydactyly had the full length tandem repeat.

i-7956df97ccbff01e981af214ba3dd7ca-alx4.jpg
(click for larger image)

Large magnitude repeat length mutations can result in gross morphological change. (A) Alx-4/ mice exhibit a duplication of the first digit (arrowhead). (B) A radiograph of the rear paw of a Great Pyrenees shows the typical double dewclaw phenotype specified in the breed standard (arrowhead). (C) Polydactylous Great Pyrenees are homozygous for a 51-nucleotide repeat contraction in the Alx-4 gene. PCR amplification of the repeat-containing regions of Alx-4 from 89 dog breeds reveals that this deletion is unique to the Great Pyrenees breed (arrow). Phenotypically normal basset hounds, flat-coated retrievers, and harriers were heterozygous for distinct two amino acid insertions (doublets). (D) DNA sequencing reveals that the deletion is caused by a contraction of the PQn repeat that results in the removal of 17 aa within the repeat.

The good news about all of this is that it represents a demonstration of another mode of relatively rapid addition of morphological diversity to a population, and that we have another mechanism for fine-tuning evolution. These tandem repeats are common in the vertebrate genome, so this could clearly be a reservoir of variation and a robust and flexible way to add new variations to a population.

There are some limitations to this study, though. First, it's focused on an extreme case: purebred dogs that have been experiencing very strong selection for specific and in some cases, outright deleterious characters. We simply don't know how important this mode of evolutionary change is under less artificial conditions. Secondly, so far we're just seeing correlations, not experimental perturbations. They're darned convincing correlations, but at some point down the road it would be good to see direct manipulation of the Q/A ratio of the Runx-2 gene in a collie, for instance, to give it the downturned nose of a bull terrier. And finally, it may just be me, but I'd like to see developmental studies of the patterns of Runx-2 and Alx-4 gene expression in dog embryos to see exactly how these variations play out.

Still, it's got me wondering. I've got this knobby nose that I can see to varying degrees in my father and paternal grandmother. I wonder if it can be traced to differences in tandem repeat length in some transcription factor?


Fondon JW, Garner HR (2004) Molecular origins of rapid and continuous morphological evolution. PNAS 101(52):18058-18063.

More like this

When we look at the face of another person, we can recognize specific features that have familial resemblances. In my family, for instance, I can recognize a "Myers nose" that my grandmother and my father and some of my siblings and kids have, and it's different than my wife's or my mother's nose…
Here is the third BIO101 lecture (from May 08, 2006). Again, I'd appreciate comments on the correctness as well as suggestions for improvement. -------------------------------------------------- BIO101 - Bora Zivkovic - Lecture 1 - Part 3 The DNA code DNA is a long double-stranded molecule…
This talk should put me back in my comfort zone—developmental biology, evolution, and fish, with the stickleback story, one of the really cool model systems that have emerged to study those subjects. What is the molecular basis of evolutionary change in nature? How many genetic changes are required…
There is a somewhat confused piece in The New York Times about eugenics for dogs today. I say confused because the article offers various cautions, but connecting the dots from the facts littered throughout suggest easily why the cautions aren't warranted. One of the big issues lurking throughout…

I wonder if it can be traced to differences in tandem repeat length in some transcription factor?

I long for the day when a coupon in a cereal box and a SASE get you your DNA sequenced in 6-8 weeks.

"Uh-oh, Mom. I just got my DNA sequenced and I've got some good news and bad news. First, the good news: no need to put money in my college fund...."

I long for the day when teenagers can glance at a 3 billion base pair sequence and actually interpret it.

I bet they'll have flying cars by then, for sure.

I long for the day when teenagers can glance at a 3 billion base pair sequence and actually interpret it.
I bet they'll have flying cars by then, for sure.

Probably no flying cars, but I'll put down money that there'll be a Facebook app to compare your sequence with those of your friends.

I thought of the nose business, too. Maybe my pug nose is not from my father's genes after all. Maybe it's from his tandem repeats.

By hoary puccoon (not verified) on 01 Oct 2007 #permalink

All this 'stuff', other than DNA coding for proteins, - is it still correct to call it 'genetic' or is there a more appropriate term?

I'm always impressed by how much the state of the art has changed since I was studying genetics and biochemistry 35 years ago. We were learning about evo, but devo didn't really exist, at least at the undergraduate level. One of the experiments we did that I thought was just too cool was converting polyuracil into polyphenylalanine. (I hope I got that right.)

Things were so much simpler in the old days (uphill, both ways, etc).

That's interesting in itself, since it says these proteins are capable of functioning with ±30 or more amino acids in their final length.

That's easiest if it's near a terminus. As you almost certainly know from experience, you can often fuse the entire GFP to a terminus of a protein, and nothing happens to the protein, except that it shines green under UV.

On another note, the bullterrier from 1976 has a canine with two roots! That hasn't existed since the Cretaceous, has it?

converting polyuracil into polyphenylalanine. (I hope I got that right.)

Yes :-)

By David Marjanović (not verified) on 01 Oct 2007 #permalink

All this 'stuff', other than DNA coding for proteins, - is it still correct to call it 'genetic' or is there a more appropriate term?

All the different genes with different numbers of tandem repeats are genes, that is, DNA coding for proteins. A gene with, say, 12 tanden repeats is a different allele than the version with 17 tandem repeats, which is a different allele than the version with 16...and so on.

Are people confused because they're not being referred to as big A and little a?

Quote:I bet they'll have flying cars by then, for sure.

The day all of the cars can fly is they day I stay on the ground.

Very good stuff PZ, thank you kindly for continuing my ejudimuhcation.

Quoth the PZ: "I long for the day when teenagers can glance at a 3 billion base pair sequence and actually interpret it.
I bet they'll have flying cars by then, for sure."

-on that note, erudite masses, it's almost 2008:
where's my goddamn jetpack?

By lithopithecus (not verified) on 01 Oct 2007 #permalink

Any work on how this affects (if at all) homologous recombination? For example, could this possibly be a limiter for successful reproduction between Furries and Klingons?

By Charles Soto (not verified) on 02 Oct 2007 #permalink

That's interesting in itself, since it says these proteins are capable of functioning with ±30 or more amino acids in their final length.

That's easiest if it's near a terminus. As you almost certainly know from experience, you can often fuse the entire GFP to a terminus of a protein, and nothing happens to the protein, except that it shines green under UV.

On another note, the bullterrier from 1976 has a canine with two roots! That hasn't existed since the Cretaceous, has it?

converting polyuracil into polyphenylalanine. (I hope I got that right.)

Yes :-)

By David Marjanović (not verified) on 01 Oct 2007 #permalink

hi
i have a cuestion because i have a smoll bull terrier he is 4 months now.ok ..
this is a skull of bull terrier?
and in what age har face is fall down?
thanks
eliel

No entiendo ni mierda ja