Introns are parts of the gene that do not contain coding information, they have to be spliced out of precursor RNA to form mature messenger RNA (mRNAs). But ask most biologists and they'll tell you that in "higher eukaryotes" all genes have introns. All? They may reply, "well not quite". The most famous examples of intronless genes are the histone genes. Also many tRNA genes are intronless. But just how many intronless genes are there in the human genome? Well I just stumbled onto this site: Genome SEGE - Intronless Genes in Eukaryotes.
Here's a couple of graphs from the SEGE (Single Exonic Genes in Eukaryotes) website:
These stats were compiled in 2004, I'll have to do some searches in PubMed/NCBI to see what is the current count.
{update}
Here is ref for the original paper:
Meena Kishore Sakharkar and Pandjassarame Kangueane
Genome SEGE: A database for 'intronless' genes in eukaryotic genomes
BMC Bioinformatics (2004) 5: 67
- Log in to post comments
Here's an explanation for why we see different amounts (and sizes) of introns in different taxa.
From the website you link to:
1. Single exon genes are identified using the CDS FEATURE definition in the genome data.
It seems like they're ignoring introns located in the 5' and 3' UTRs.
Also:
2. Potential pseudogenes are identified by scanning for a polyA tail within 1000 nt from the stop codon and are preceded by a polyA signal.
I think they're trying to say that they excluded recently retroposed genes from their data. I don't know why they refer to them as pseudogenes. Pseudogenes don't necessarily need to be retroposed and retroposed genes aren't always pseudogenes.
check your timestamp doug! this is a bio-blog, not a physics one, no time travelling!
Sorry about the time travelling - was going to post this tomorrow. I forgot to change the "published" mode to "scheduled" mode.
RPM,
I'm sure that the real number must be lower than 10% of all genes. Also the total # of genes they survey is close to 30,000 ... I'll have to check whether many of these "intronless genes" have been recently classified as pseudogenes. The last count of the # of genes in the human genome (according to Eric Landers) is ~19,000.
RPM,
Lots of math in that paper! Funny enough, Lynch mentions that there are too many introns to be explained by neutral mechanisms. The surplus of introns was thus likely caused by positive selection for introns. Why? You need introns for NMD (nonsense mediated decay) which weeds out genes that accumulate premature stop codons (incidentally there are very few genes with introns in the 3' UTR ... such genes would be effectively destroyed by the NMD machinery). In contrast to vertibrates, NMD is intron independent in yeast and flies ... it is thought that NMD was intron dependent in the earliest eukaryotic anscester as plants use introns to detect NMD. Another point that he doesn't address is that in higher eukaryotes, the act of splicing catalyzes nuclear export. Thus introns serve to enhance expression. This may have evolve as a defence against the expression of genes derived from foreign pathogens such as viruses that have compact genomes. With these two phenomena it becomes harder to see how there could be so many intronless genes in the human genome.
^
Are you sure about this? I assume that it is more likely that the majority of premature stop codons are generated by wrong splicing. Thus, NMD would account for elimination of wrongly spliced transcripts rather then as a control recation against PTCs fixed in the genome.
In addition, there was a recent paper (unfortunately I can't remember the authors) claiming that alternative splicing is a means to downregulate the majority of transcripts via NMD as a reaction to stress.
A reference to my claim
(emphasis mine)
I may be wrong about the stress issue. Still there are hypothesis coupling NMD with regulation of gene expression rather then elimination of genetically fixed PTCs:
(emphasis mine)
Good point sparc. Thanks for the refs. I must say that if a transcript is targeted for degradation by NMD, it is likely that it will be destroyed before researchers can isolate the cDNA. Despite what I wrote (that PTCs accumulate genetically), I would suspect that most PTC are not generated by genetic mutations but by aberrant mRNA transcription and aberrant mRNA processing. Any mistakes in transcription or splicing that cause frame shifts will generate PTCs. NMD will destroy these transcripts - so I think of it as a surveillance mechanism. Now the cell may exploit NMD to regulate genes during certain conditions such as stress (in fact parts of the NMD machinery accumulate in what is called stress granules) but as a stated before most NMD substrates will be destroyed rapidly and be hard to isolate without inhibiting NMD itself.
I want to understand in topic intronless pseudogene for RNA -cDNA please explain
thank you