Duplicated genes can arise via various mechanisms — polyploidization, chromosomal duplication, segmental duplication, and retroposition — and we can usually distinguish the various mechanisms as each has distinct signatures. For example, retroposed duplicates arise when an RNA transcript is reverse transcribed back into DNA and re-inserted into the genome. This is how many transposable elements (TEs) and viruses propagate throughout genomes, but the reverse transcriptase encoded by TE and viral genomes can be used on endogenous transcripts as well. Because they arise via the reverse transcription of mRNA, retroposed duplicates lack introns and often have a poly A tail.
When searching for retroposed genes in sequenced genomes, researchers will look for pairs of duplicates in which one copy is missing introns. They may also try to identify a poly A tail, but that signature will decay more rapidly than the missing introns (i.e., a run of A’s will only be useful for identifying recent retroposition events, whereas missing introns are useful for identifying both new and old events). Because they give us more power to characterize duplications, structural signatures (presence or absence of types of sequences, which part of the gene is duplicated, etc) are more reliable than sequence signatures.
Oftentimes, however, genes that have no introns are duplicated via retroposition. Or only an exonic portion of a gene is duplicated and we would like to determine whether it arose via retroposition or some other mechanism. With that in mind, I was wondering if anyone has any information about the reliability of a second structural signature (missing introns being the first): which part of a gene (5′, 3′, or middle) gets duplicated.
Reverse transcription proceeds from the 3′ to the 5′ end of the transcript (creating a cDNA product in the 5′ to 3′ direction). If the reverse transcriptase terminates prior to reaching the 5′ end of the transcript (where the start codon is located), only the 3′ end of the gene (where the stop codon is located) will be duplicated. We see evidence for this when looking at human retroposed duplications (the 3′ ends are overrepresented in the genome relative to the 5′ ends). Can we use this logic to decide whether a duplication missing other signatures (no intron signature and no poly A tail) is retroposed or not? If a duplicated gene only contains a portion of the terminal exon or the 3′ end of a single exon gene, is this sufficient evidence that the gene arose via a retroposition event (assuming that a poly A signal is not present)?