A Question about Retroposition

Duplicated genes can arise via various mechanisms -- polyploidization, chromosomal duplication, segmental duplication, and retroposition -- and we can usually distinguish the various mechanisms as each has distinct signatures. For example, retroposed duplicates arise when an RNA transcript is reverse transcribed back into DNA and re-inserted into the genome. This is how many transposable elements (TEs) and viruses propagate throughout genomes, but the reverse transcriptase encoded by TE and viral genomes can be used on endogenous transcripts as well. Because they arise via the reverse transcription of mRNA, retroposed duplicates lack introns and often have a poly A tail.

When searching for retroposed genes in sequenced genomes, researchers will look for pairs of duplicates in which one copy is missing introns. They may also try to identify a poly A tail, but that signature will decay more rapidly than the missing introns (i.e., a run of A's will only be useful for identifying recent retroposition events, whereas missing introns are useful for identifying both new and old events). Because they give us more power to characterize duplications, structural signatures (presence or absence of types of sequences, which part of the gene is duplicated, etc) are more reliable than sequence signatures.

Oftentimes, however, genes that have no introns are duplicated via retroposition. Or only an exonic portion of a gene is duplicated and we would like to determine whether it arose via retroposition or some other mechanism. With that in mind, I was wondering if anyone has any information about the reliability of a second structural signature (missing introns being the first): which part of a gene (5', 3', or middle) gets duplicated.

Reverse transcription proceeds from the 3' to the 5' end of the transcript (creating a cDNA product in the 5' to 3' direction). If the reverse transcriptase terminates prior to reaching the 5' end of the transcript (where the start codon is located), only the 3' end of the gene (where the stop codon is located) will be duplicated. We see evidence for this when looking at human retroposed duplications (the 3' ends are overrepresented in the genome relative to the 5' ends). Can we use this logic to decide whether a duplication missing other signatures (no intron signature and no poly A tail) is retroposed or not? If a duplicated gene only contains a portion of the terminal exon or the 3' end of a single exon gene, is this sufficient evidence that the gene arose via a retroposition event (assuming that a poly A signal is not present)?

More like this

I don't know the answer but there are a few things that I would do to approach the question. First, I would look at the structure of genes that are heavily transcribed - it seems likely that they would be good candidates for retrotransposition. In fact, histone genes, I think produce lots of transcripts and lack polyA tails, perhaps they lost them through this kind of mechanism. You can ask if they have the signature characteristics that you're investigating.

Second, I would look at bits of DNA that are known to be moved around by trans-acting transposase genes, like the elements in corn that Barbara McClintock studied.

Third, we have a friend who wrote a program that you might find helpful for identifying repeats in diverse organisms so that you can look at the structure. The program is called "RepeatMasker" and you can find the info and links to the web server here: Repeatmasker.

I've used repeatmasker. The libraries are heavy on mammalian repeats, although I've created my own libraries for the repeats I was interested. It's a good program, but it's essentially a blast search.

Second, I would look at bits of DNA that are known to be moved around by trans-acting transposase genes, like the elements in corn that Barbara McClintock studied.

Well, we know that LINES and SINES in the human genome are overrepresented for their 3' ends, suggesting my screening procedure would be appropriate.

I'm not trying to identify all the retroposed elements in a genome. What I'm interested in is EXCLUDING retroposed genes from a dataset made up of duplicated genes. So I'm trying to find diagnostic characters that tell me if something is retroposed or, better yet, to tell me that something is definitely not retroposed.

I wasn't think about LINES or SINES, I was referring to Ac and Ds.

Anway, as far as RepeatMasker, it's more sensitive than BLAST. RepeatMasker is derived from cross_match, a more sensitive search algorithm (phrap uses cross_match) and it uses a database of repetitive elements. How well it works, in part depends on whether or not you're using the most recent version of the GIRI database.

Nevertheless, there is a way to create a database on the fly and this might be helpful for you. You can read about it in the presentation that I referenced below. I put a link to the pdf doc at the bottom.

Development of a software program to dynamically create repetitive element databases for RepeatMasker

Arian Smit 1, Robert Hubley1, Todd M. Smith2.
1. Institute for Systems Biology,
2. Geospiza, Inc. Seattle WA.

2005 Plant and Animal Genomics Conference

Go here. The presentation is at the top of the page.

Sandy, you're right about repeatmasker. I was wrong.

You're proposed solution is more than I was planning on doing. It sounds like a good project, but I'm trying to find someone who has already determined some of the signatures of retroposed duplications. I'd be willing to try it (when/if I have some free time), but it's not something I'd set as high priority. I'm also betting that it's been already, which would decrease the incentive to spending time on it.

"If a duplicated gene only contains a portion of the terminal exon or the 3' end of a single exon gene, is this sufficient evidence that the gene arose via a retroposition event (assuming that a poly A signal is not present)?"

I would say no, don't count on small 3' fragments as being necessarily retrotransposed. There are too many examples of single exons being duplicated by typical tandem duplication events within gene family clusters, where they can be readily identified. Such "orphan" exons can start out as partial tandem duplicates, then later be isolated by inversions or transpositions.

By genotypical (not verified) on 13 Feb 2007 #permalink

RPM,

Arian S. (the author of RepeatMasker) might have done this kind of thing. If you want an e-mail introduction, or it you want me to e-mail him and ask him if he knows, let me know.

Could have been created by tandem duplication, then displaced later. Example -- pseudogenes from the human cytochrome P450 3A subfamily on chromosome 6.

By genotypical (not verified) on 14 Feb 2007 #permalink