Are Transcription Factors Enriched in Every Dataset?

By evolgen on November 13, 2007.

Is it just me or does every analysis that looks for over-represented gene ontology (GO) terms turn up transcription factors? It doesn't matter if the study is looking for genes under positive selection or something else. It just seems like transcription factors are enriched in every dataset.

More like this

Evolution of Gene Expression in Primates

I commented a couple of days ago on a news item about a journal article on the evolution of gene expression in primates that had yet to be published. Well, the article has been published, and I've read it (Nature has also published a news and views piece on the study by Rasmus Nielsen). I have a…

The Evolution of My Thinking about the Evolution of Gene Regulation

...or how a learned to stop worrying and love evo-devo. As my mind gets a chance to process some of the stuff I heard and talked about at the meeting I just returned from, I'll post some thoughts that will help me organize my ideas (hopefully better organized than that last sentence). This is the…

Evolution of Non-Coding Elements in Vertebrates

(Disclaimer: this is not my field but the paper looked interesting so here goes ...) Promoters, enhancers and other DNA regulatory elements that turn on or off gene transcription are important. We've known this for quite a while. Many would argue that metazoans all have the same major gene families…

New and Exciting in PLoS ONE

There are 24 new articles in PLoS ONE today. As always, you should rate the articles, post notes and comments and send trackbacks when you blog about the papers. You can now also easily place articles on various social services (CiteULike, Connotea, Stumbleupon, Facebook and Digg) with just one…

Rich,

Every protein that I've ever work on (be it a cytoskeletal, RNA binding or membrane associated protein) has been described as a trnascription factor has in some crapy paper. In fact I've been contemplating writing a post entitled "Why Does Every Freakin' Protein Have a Night Job as a Transcription Factor?"

Okay, so there are a bunch of proteins that are misannotated as transcription factors. Is it logical to assume that they are distributed randomly amongst all proteins? If so, this shouldn't lead to the over-representation of transcription factors in various datasets.

Is it just me or does every analysis that looks for over-represented gene ontology (GO) terms turn up transcription factors?

I think it might just be you :)

which papers are you thinking of?

A lot of GO enrichment analyses are biased by gene length. For example...if you're looking for enrichment in genes that have, say, some miRNA binding site or other sequence motif, then longer genes are more likely to have such binding sites by chance. If you just use a hypergeometric distribution (treating every gene as an equivalent "ball in a bag") to look for a GO enrichment, as is very common, the significance of long genes will be amplified. I am not sure if this applies to transcription factors, but metazoan nervous system genes tend to have long UTRs and come up (questionably) in these analyses all the time. Of course, one could argue that the longer UTRs might reflect the biology of more complex regulation and shouldn't be argued away.

I love reading papers from people who are too purely computational and get excited about enrichments in "macromolecular biosynthesis" or "cytosol". Thanks for narrowing it down for us :)

Heh! The most common annotation in GO data is "unknown". Summary: truth be told, we know sod all about what most genes do.

The last GO analysis I ran showed enrichment for "unknown function", "unknown component" and "unknown process". Conclusion: we know less than sod all about my particular system...

Another possibility is that genomics folk tend to report enrichment of transcription factors as often as they possibly can. in part because it's one of the easiest types of overrepresentation to weave into a story about how your set of upregulated genes is mechanistically involved in subject X.

Also: Peter, I'm totally with you on the "unknown" genes. They're routinely the most prevalent in my own GO-type analysis. Either I'm really breaking ground or completely barking up the wrong tree... :-)

Here is another dataset.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

This is a Good-bye Post

January 16, 2009

This is the final post ever at evolgen. It was a fun 4+ years, the last three spent at ScienceBlogs, but it has come time for me to close up shop. When I first got into blogging, I did it as a way to share what was on my mind to the few people who would read what I had to say (usually in topics…

Mendel's Garden #27 - Call for Submissions

January 2, 2009

Mendel's Garden is the original genetics blog carnival. The next edition will be hosted by Jeremy at Another Blasted Weblog. If you would like to submit a blog post to be included in the carnival, send an email to Jeremy (jcherfas at mac dot com). The carnival should be posted within the next few…

Eric Lander Teaches?

December 20, 2008

John Hawks points out that Eric Lander has been appointed to co-chair Obama's Council of Advisers on Science and Technology along with science adviser John Holdren and Nobel Laureate Harold Varmus. Here's how the AP article describes Lander: Lander, who teaches at both MIT and Harvard, founded the…

The Implementation of Molecular Evolution for the Masses

December 18, 2008

A couple of years ago, there was talk in the bioblogosphere about getting the general public interested in bioinformatics and molecular evolution: Amateur bioinformatics? Lowering the Ivory Tower with Molecular Evolution Molecular Evolution for the Masses The idea was inspired by the findings of…

Do people still use microarrays?

December 17, 2008

Larry Moran points to a couple of posts critical of microarrays (The Problem with Microarrays): Why microarray study conclusions are so often wrong Three reasons to distrust microarray results Microarrays are small chips that are covered with short stretches of single stranded DNA. People…