Making the Data Public: Interview With Xan Gregg

By clock on March 5, 2008.

Xan Gregg has also attended both the first Science Blogging Conference and the second one in January, where he co-moderated a session on Public Scientific Data. He blogs on FORTH GO.

Welcome to A Blog Around The Clock. Would you, please, tell my readers a little bit more about yourself? Who are you? What is your scientific background? What is your Real Life job?

I'm a software engineer working at SAS Institute on a desktop "statistical discovery" application called JMP. (Yes, we have a blog, and I sometimes post to it.) My primary interest is data visualization, and in 2006 I won a data visualization competition judged by author Stephen Few. My background is in math and computer science, and I use both fields as a team member at Project Euler, which is a site full of challenging math problems that usually require writing programs to solve.

When and how did you discover science blogs? What are some of your favourites? Have you discovered any new cool science blogs while at the Conference?

It wasn't until I attended the first Science Blogging Conference that I knew about so much science blogging going on. Now I have trouble keeping up. I can hardly read as fast as you can blog! I like those blogs that provide good summaries of recent research, such as Cognitive Daily, Statistical Modeling, and one I discovered at the conference, ThankYouBrain by attendee Bill Klemm.

How did you get interested in public data?

Having a focus on data visualization, I'm always analyzing graphs and trying to think of ways to make them better. To really make a point, I need to actually produce a better visualization from the same data, and I have been disappointed to find that the data is not often readily available. I can sometimes to resort to programs like GraphClick that can scrape data from standard graphs, but even that doesn't work for summary graphs where the real data is invisible.

Why should scientists make their raw data public? What are the pros and cons?

The more I researched the subject, I found a disconnect between what scientists say and what they do. Almost every authority extolls the principles of public data, but few scientists practice it openly. I've found it to be primarily a question of when. Full open science labs like Jean-Claude Bradley's UsefulChem publish data as it's generated, but that model isn't for everyone. I'd be happy to see data published with papers, whith the policy of the American Economic Review, but the usual answer to the question of when is "when somebody asks for it nicely enough."

The pros and cons depend on your goals. If you're trying to further public knowledge, then sharing data supports that goal. If you're in a competitive situation, then sharing data could weaken your position. I guess that's a philosophical issue on the nature of scientific research and the public good. In practical terms, publishing data encourages better review and new derivative research, and the only con is with confidential data that can't be effectively anonymized.

Are there disciplinary differences?

The main disciplinary difference I've seen regards the quantity of data. Fields like astronomy and genetics have tons of data, which encourages central data respositories for archiving data.

How would you go about persuading a scientist to make his/her data public?

The idea is there already, so I'd focus on showing how easy it is to share data in a minimal way. Of course, most scientists take their cues from journals and funders, and we need more of them to require data. Some governments, including the US government, are moving in that direction for publicly funded research. It'd be nice to see PLoS adopt something like the data policy of American Economic Review. I'd be happy to work with someone on setting up a data repository site.

How should the raw data be presented online?

Anyway you can. Just a CSV (comma-separated values) file sitting on a web server is fine. Better is an independent site, such as Swivel or Google Docs. The important thing is to remember to include a description of the data fields and sources. Then use the URL of your data as a citation point.

Is there anything that happened at this Conference - a session, something someone said or did or wrote - that will change the way you think about science communication, or something that you will take with you to your job, blog-reading and blog-writing?

The whole conference makes me temporarily depressed. I find out that for every good idea I've had, not only has someone else already had it, but three sites are already implementing it!

It was so nice to see you again and thank you for the interview.

Thank you, Bora. Keep on tickin'.

============================

Check out all the interviews in this series.

More like this

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

New URL for this blog

July 5, 2011

Earlier this morning, I have moved my blog over to the Scientific American site - http://blogs.scientificamerican.com/a-blog-around-the-clock/. Follow me there (as well as the rest of the people on the new Scientific American blog network

New URL/feed for A Blog Around The Clock

July 26, 2010

This blog can now be found at http://blog.coturnix.org and the feed is http://blog.coturnix.org/feed/. Please adjust your bookmarks/subscriptions if you are interested in following me off-network.

A Farewell to Scienceblogs: the Changing Science Blogging Ecosystem

July 19, 2010

It is with great regret that I am writing this. Scienceblogs.com has been a big part of my life for four years now and it is hard to say good bye. Everything that follows is my own personal thinking and may not apply to other people, including other bloggers on this platform. The new contact…

Open Laboratory 2010 - submissions so far

July 19, 2010

The list is growing fast - check the submissions to date and get inspired to submit something of your own - an essay, a poem, a cartoon or original art. The Submission form is here so you can get started. Under the fold are entries so far, as well as buttons and the bookmarklet. The instructions…

Clock Quotes

July 18, 2010

At bottom every man know well enough that he is a unique being, only once on this earth; and by no extraordinary chance will such a marvelously picturesque piece of diversity in unity as he is, ever be put together a second time. - Friedrich Wilhelm Nietzsche