A Question about Retroposition

By evolgen on February 13, 2007.

Duplicated genes can arise via various mechanisms -- polyploidization, chromosomal duplication, segmental duplication, and retroposition -- and we can usually distinguish the various mechanisms as each has distinct signatures. For example, retroposed duplicates arise when an RNA transcript is reverse transcribed back into DNA and re-inserted into the genome. This is how many transposable elements (TEs) and viruses propagate throughout genomes, but the reverse transcriptase encoded by TE and viral genomes can be used on endogenous transcripts as well. Because they arise via the reverse transcription of mRNA, retroposed duplicates lack introns and often have a poly A tail.

When searching for retroposed genes in sequenced genomes, researchers will look for pairs of duplicates in which one copy is missing introns. They may also try to identify a poly A tail, but that signature will decay more rapidly than the missing introns (i.e., a run of A's will only be useful for identifying recent retroposition events, whereas missing introns are useful for identifying both new and old events). Because they give us more power to characterize duplications, structural signatures (presence or absence of types of sequences, which part of the gene is duplicated, etc) are more reliable than sequence signatures.

Oftentimes, however, genes that have no introns are duplicated via retroposition. Or only an exonic portion of a gene is duplicated and we would like to determine whether it arose via retroposition or some other mechanism. With that in mind, I was wondering if anyone has any information about the reliability of a second structural signature (missing introns being the first): which part of a gene (5', 3', or middle) gets duplicated.

Reverse transcription proceeds from the 3' to the 5' end of the transcript (creating a cDNA product in the 5' to 3' direction). If the reverse transcriptase terminates prior to reaching the 5' end of the transcript (where the start codon is located), only the 3' end of the gene (where the stop codon is located) will be duplicated. We see evidence for this when looking at human retroposed duplications (the 3' ends are overrepresented in the genome relative to the 5' ends). Can we use this logic to decide whether a duplication missing other signatures (no intron signature and no poly A tail) is retroposed or not? If a duplicated gene only contains a portion of the terminal exon or the 3' end of a single exon gene, is this sufficient evidence that the gene arose via a retroposition event (assuming that a poly A signal is not present)?

More like this

I don't know the answer but there are a few things that I would do to approach the question. First, I would look at the structure of genes that are heavily transcribed - it seems likely that they would be good candidates for retrotransposition. In fact, histone genes, I think produce lots of transcripts and lack polyA tails, perhaps they lost them through this kind of mechanism. You can ask if they have the signature characteristics that you're investigating.

Second, I would look at bits of DNA that are known to be moved around by trans-acting transposase genes, like the elements in corn that Barbara McClintock studied.

Third, we have a friend who wrote a program that you might find helpful for identifying repeats in diverse organisms so that you can look at the structure. The program is called "RepeatMasker" and you can find the info and links to the web server here: Repeatmasker.

I've used repeatmasker. The libraries are heavy on mammalian repeats, although I've created my own libraries for the repeats I was interested. It's a good program, but it's essentially a blast search.

Second, I would look at bits of DNA that are known to be moved around by trans-acting transposase genes, like the elements in corn that Barbara McClintock studied.

Well, we know that LINES and SINES in the human genome are overrepresented for their 3' ends, suggesting my screening procedure would be appropriate.

I'm not trying to identify all the retroposed elements in a genome. What I'm interested in is EXCLUDING retroposed genes from a dataset made up of duplicated genes. So I'm trying to find diagnostic characters that tell me if something is retroposed or, better yet, to tell me that something is definitely not retroposed.

I wasn't think about LINES or SINES, I was referring to Ac and Ds.

Anway, as far as RepeatMasker, it's more sensitive than BLAST. RepeatMasker is derived from cross_match, a more sensitive search algorithm (phrap uses cross_match) and it uses a database of repetitive elements. How well it works, in part depends on whether or not you're using the most recent version of the GIRI database.

Nevertheless, there is a way to create a database on the fly and this might be helpful for you. You can read about it in the presentation that I referenced below. I put a link to the pdf doc at the bottom.

Development of a software program to dynamically create repetitive element databases for RepeatMasker

Arian Smit 1, Robert Hubley1, Todd M. Smith2.
1. Institute for Systems Biology,
2. Geospiza, Inc. Seattle WA.

2005 Plant and Animal Genomics Conference

Go here. The presentation is at the top of the page.

Sandy, you're right about repeatmasker. I was wrong.

You're proposed solution is more than I was planning on doing. It sounds like a good project, but I'm trying to find someone who has already determined some of the signatures of retroposed duplications. I'd be willing to try it (when/if I have some free time), but it's not something I'd set as high priority. I'm also betting that it's been already, which would decrease the incentive to spending time on it.

"If a duplicated gene only contains a portion of the terminal exon or the 3' end of a single exon gene, is this sufficient evidence that the gene arose via a retroposition event (assuming that a poly A signal is not present)?"

I would say no, don't count on small 3' fragments as being necessarily retrotransposed. There are too many examples of single exons being duplicated by typical tandem duplication events within gene family clusters, where they can be readily identified. Such "orphan" exons can start out as partial tandem duplicates, then later be isolated by inversions or transpositions.

RPM,

Arian S. (the author of RepeatMasker) might have done this kind of thing. If you want an e-mail introduction, or it you want me to e-mail him and ask him if he knows, let me know.

genotypical,

What if they're not tandem?

Could have been created by tandem duplication, then displaced later. Example -- pseudogenes from the human cytochrome P450 3A subfamily on chromosome 6.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

This is a Good-bye Post

January 16, 2009

This is the final post ever at evolgen. It was a fun 4+ years, the last three spent at ScienceBlogs, but it has come time for me to close up shop. When I first got into blogging, I did it as a way to share what was on my mind to the few people who would read what I had to say (usually in topics…

Mendel's Garden #27 - Call for Submissions

January 2, 2009

Mendel's Garden is the original genetics blog carnival. The next edition will be hosted by Jeremy at Another Blasted Weblog. If you would like to submit a blog post to be included in the carnival, send an email to Jeremy (jcherfas at mac dot com). The carnival should be posted within the next few…

Eric Lander Teaches?

December 20, 2008

John Hawks points out that Eric Lander has been appointed to co-chair Obama's Council of Advisers on Science and Technology along with science adviser John Holdren and Nobel Laureate Harold Varmus. Here's how the AP article describes Lander: Lander, who teaches at both MIT and Harvard, founded the…

The Implementation of Molecular Evolution for the Masses

December 18, 2008

A couple of years ago, there was talk in the bioblogosphere about getting the general public interested in bioinformatics and molecular evolution: Amateur bioinformatics? Lowering the Ivory Tower with Molecular Evolution Molecular Evolution for the Masses The idea was inspired by the findings of…

Do people still use microarrays?

December 17, 2008

Larry Moran points to a couple of posts critical of microarrays (The Problem with Microarrays): Why microarray study conclusions are so often wrong Three reasons to distrust microarray results Microarrays are small chips that are covered with short stretches of single stranded DNA. People…