The series of interviews with some of the participants of the 2008 Science Blogging Conference was quite popular, so I decided to do the same thing again this year, posting interviews with some of the people who attended ScienceOnline’09 back in January.
This is also the first in what I hope will be a long series of interviews with researchers in my field of Chronobiology.
Today, I asked John Hogenesch, my chronobiologist colleague who moderated the ‘Community intelligence applied to gene annotation’ session at ScienceOnline’09, to answer a few questions.
Welcome to A Blog Around The Clock. Would you, please, tell my readers a little bit more about yourself? Who are you? What is your (scientific) background?
I’m an Associate Professor at the University of Pennsylvania in the Department of Pharmacology. Our lab works on clocks, but also on functional genomics in mammals. I did my graduate work in neuroscience at Northwestern University at the Chicago campus with Chris Bradfield. In Chris’s lab, I worked on identifying and characterizing new members of the bHLH-PAS class of transcription factors — several of these orphan PAS domain proteins ended up being Bmal1, its paralog Bmal2, and Npas2, core components of the E-box machinery of the clock. For my postdoctoral training, I joined the lab of Steve Kay at the then developing Genomics Institute of the Novartis Research Foundation. Later , I started my own lab there focusing on functional genomics and became Director of Genomics. These research projects included circadian clock research, but also other areas of biology that were of interest to me or GNF.
How did it happen that you became a scientist? How did you end up in chronobiology?
I’m a second-generation scientist, my dad is professor of chemistry at the University of Southern California. My mom also teaches at USC, and my brother is a political science professor at Cal State Northridge. So, you could say that science/academia is in the family.
I ended up interested in chronobiology largely based on a lecture in the first year graduate school by Joe Takahashi. Joe gave this fabulous lecture covering the progress of the Drosophila clock field in the fall of 1992, and I was hooked.
What is your Real Life job? What do you want to do/be when (and if ever) you grow up?
My real life job is the complicated life of academic science. Teaching, mentoring, sitting on study section, running a research group, being involved in graduate groups, sitting on committees, writing grants, and, time permitting, writing papers. (Oh yeah, I have twin one-year-old boys and a five-year-old to occupy my remaining day and night.)
I’m not sure what I will be when I grow up. I view science as a career of continual development. I started my research career mining genome data for new bHLH-PAS proteins (informatics). Then I cloned and characterized them — molecular and cellular biology. Then I became a genomicist, and learned a lot more about bioinformatics. I’m not really sure what will come next, but I hope to continue to learn how to do new things and apply them to subjects I’m interested in such as the clock.
Can you explain to my lay audience, what your research is all about?
Our research involves learning how the clock works. In humans, the clock is actually your whole body, as clocks are everywhere, not just in your brain. There are really three facets of circadian clock function — synchronizing with your environment, keeping time, and regulating physiology and behavior. We are working on all three of these issues to various extents.
What aspect of science communication and/or particular use of the Web in science interests you the most?
I call myself a first generation Atari American — I’ve spent most of my life around computers. Because of that, I probably see more opportunities than most people to exploit information technology, bioinformatics, and the Web. I use it to manage my own personal communications — I’m big into Gmail and Google voice. I use tools such as Basecamp, project management software, to keep track of how things are going in the lab and collaborate with other laboratories. We dabble in computational approaches in the lab, occasionally more than dabble.
About 10 years or so ago, I listened to David Botstein when he began advocating for open science and the release of large data sets. It was obvious — if you collect information on 10,000 genes, but only follow up one or a few, it’s really a shame to let the remaining data lie fallow. It occurred to me that a good way to avoid this would be to publish data and make it available. Again pretty obvious. However, depositing data in a database is not enough to ensure that it is used. In 1998, Rusty Thomas, a colleague in the Bradfield lab, and I put together a gene expression database to enable end-users, not card-carrying computational biologists, to explore large toxicology data sets. When I got to GNF and had the resources to do something like this at scale, I jumped at the chance. With an extremely talented graduate student, Andy Su (now director of computational biology at GNF), we built the Gene Atlas/Symatlas, a repository of multiple tissue expression data for human and mouse genes. This resource has been highly used by the research community. I thought, if this works for tissue specific gene expression, which was a peripheral interest of mine, it should work just as well for circadian data. So we built the first circadian expression databases. Now, we’re putting up other large-scale data sets such as siRNA screens. When tens, dozens, or hundreds of labs are using your resources, good things will come of it.
The Web and technology to exploit it have changed, but the basic principle of open science has not. Papers associated with these databases are read more, the data sets are used more, the papers are cited more, it’s win-win.
You are involved in a number of initiatives involving Wikipedia and gene annotation online – can you tell us more about these?
My foray into gene annotation efforts really began with the Gene Atlas. The Web isn’t static, though, and other opportunities emerged. One of these was Wikipedia. We noticed that the canonical gene annotation efforts at NCBI were understaffed — one person ran Locus Link. Andy thought , why not apply community intelligence, which generated a resource to rival Encyclopaedia Britannica, to gene annotation efforts? I agreed. Again, if something like this is going to happen on a genome scale, I’ll do my best to make sure that the circadian clock community benefits first. Now, if you go to Google and search a clock gene such as Bmal1, the first link that comes up is the Arntl entry in Wikipedia. When a clock rookie looks at the circadian gene, they go to its Wikipedia page, an archive and evolving review paper, to learn about it. That’s a fact.
The second recent development is BioGPS, a descendent of the Gene Atlas and SymAtlas. It handles gene synonyms, but more importantly, it allows one to use lightweight methods, URL-based, to aggregate and visualize gene-based data sets. This is the technology we use to build the siRNA screening database. We put our data in there, but also linked this cell based screening data to gene expression data sets (circadian and multiple tissue expression), annotation efforts at NCBI and Wikipedia, and the UCSC genome browser. The really cool aspect of it, though, is that it’s customizable. If you want to add a new data set, you can, or you can link to one of the 100+ plug-in data sets with a couple of mouse clicks. A customized gene portal with your favorite data in a couple of minutes for free.
How does (if it does) blogging figure in your work? How about social networks, e.g., Twitter, FriendFeed and Facebook?
I don’t currently blog. It’s not that I’m opposed to it, it’s just that I have my hands full. I do use Facebook, but mostly to keep up with friends and family.
When and how did you discover science blogs? What are some of your favourites? Have you discovered any new cool science blogs while at the Conference?
Is there anything that happened at this Conference – a session, something someone said or did or wrote – that will change the way you think about science communication, or something that you will take with you to your job, blog-reading and blog-writing?
Andy and I had a long discussion with Deepak Singh from Amazon on their Web Services platform (AWS). It’s two things, storage and compute power — you buy what you need when you need it. I came back to Penn and began advocating for testing these platforms out on our own data. Even big institutions such as ours have problems with access to compute clusters. We are already exploiting AWS for proteomics work, and beginning to do the same now for genomic data — Steven Salzberg at the University of Maryland has pioneered some of these ideas.
My summary: at this point, if you use north of 70% of your CPU cycles, you’re probably better off buying your own. If you use less than 50%, AWS already makes sense, and much below that, I would argue it’s a no-brainer. There are some problems — you have to code in a particular way, data transfer costs can add up, but these things can be mitigated and Amazon is working hard to do so. Why buy expensive hardware, maintain and service it, and compete for high-priced IT talent, when Amazon already does that better than academia ever will?
For the molecular biologists in the audience, it’s sort of like buying a polyacrylamide gel rather than pouring one. Until I told people in the lab about eight years ago, no more pouring page gels. It’s time-consuming, which is money consuming, and I would rather have them do something else.
It was so nice to see see you again and thank you for the interview. I hope to see you again next January.