From the archives: A Structural Exploration of the Science Blogosphere: Director's Cut

By cpikas on March 10, 2010.

This was originally posted 1/9/2009 on my old blog.

Due to popular demand (well 3 requests :) ), this is a commentary and additional information for my conference paper and presentation:
Pikas, C. K. (2008). Detecting Communities in Science Blogs. Paper presented at eScience '08. IEEE Fourth International Conference on eScience, 2008. Indianapolis. 95-102. doi:10.1109/eScience.2008.30 (available in IEEE Xplore to institutional subscribers) [also self-archived - free!- here]

The presentation is embedded in another blog post, and is available online at SlideShare. The video of me talking about it [was?] available on the conference site, but I haven't gotten it to load.

Context:
I'm interested in scholarly communication in science, engineering, and math. Specifically, informal scholarly communication and how information and communication technologies, in particular social computing technologies, can/do/might impact informal scholarly communication in science/math/engineering. I'm also interested in knowledge production and public communication of science, two sub-areas of STS (this acronym has several translations - the most common probably science and technology studies).

As a blogger, and a 2-time (soon to be 3) attendee of what was the NC Science Blogging Conference and a reader of science blogs, I became curious about how and why scientists use blogs and if their use is: a) similar to how non-scientists use blogs b) for informal scholarly communication (to other scientists about their work) c) for public communication of science d) for personal information management e) maybe for team collaboration(?)... The first way I looked at this was by doing a study with content analysis and interviews of chemists and physicists (this has not been published yet, but maybe someday, these things aren't as perishable as writings in other fields, I hope). The second study swings all the way to a structural analysis of the science blogosphere - and that's what was reported here.

In social network analysis (SNA), you look at the link structure, not the attributes of the actors or nodes. The idea is that links show evidence of potential information flows or influence. You can pick out prestigious or central actors, and groups which are more tightly connected to each other than to the rest of the network.
The first major problem was locating science blogs - and even drawing any sort of boundary as to what a science blog was or wasn't. Given that I'm interested in how these things contribute to science, I drew the line thusly:

Blogs maintained by scientists that deal with any aspect of being a scientist
Blogs about scientific topics by non-scientists

Omitted were:

Primarily political speech
Ones maintained by corporations
Non-English language

(you could definitely draw the line somewhere else, but this is what I did!)

Also given that I'm a great searcher but almost not a coder at all, I did this by search, snowball, and any hook or crook to get as big a set as possible. I went to each of these, and copied off the URLS from the blogrolls (to answer a question from a Scibling - if you had a rotating list that showed up in javascript on the page source, I probably got it; if you have a second page with a list of 300 blogs (cough - Bora - cough); I probably got it, likewise if generated by like GoogleReader or something)... so this was incredibly tedious, and probably missed a few, but probably pretty accurate. So that was the first network.

The second network - and I originally had a much grander scheme - took the "most interesting" (most central by common measures) blogs from the first network, and then used Perl scripts (core script developed by Jen Golbeck, and then I customized to work for non-wordpress blogs, and blogs where people changed their templates a lot - you all really could have made this easier, lol) to pull all of the commenter links off of the last 10 posts (this was done in like April).

Blogs have links between them a) in the content b) in the blogroll c) in signed comments... other studies have used basically any link on the page, but the fact is that it's not really saying much to link within a post (a little link love, but not a real endorsement). Blogrolls are some sort of endorsement, typically, and signing a comment means *something*.

So then I ran all the typical SNA things across it to look at central actors and to find cohesive subgroups. As far as centrality - no real surprises. As far as cohesive subgroups - a bit more tricky. Basically one large component - and not terribly clumpy, with the exception of the astro bloggers - they're pretty tight. Most of the community detection techniques use a binary split - or start with binary splits - none of these were at all effective in dividing up the hairball. Spin glass, OTOH, worked beautifully to return 7 clusters. So then I went back and looked at the blog and figured out the commonality for each of the clusters (yes, I could have used some NLP to extract terms and automatically label the clusters, but there were 7 so...).

The single component isn't too surprising because we know from diffusion of innovations for ICTs that we would expect people to pick this up from other people and then probably link back. The power law degree distribution is also very typical when you're talking the activities of people (whether Lotka, Zipf, Pareto, Bradford.... whatever law). The clusters were related to subject areas - very broad subject areas. One question in my mind was how much people would be outside of their home discipline in their reading/commenting... based on this network, certainly outside of their particular specialty, but still in the neighborhood with the exception of a few "a-list" science bloggers who everyone reads.

What was interesting - and most definitely worthy of further investigation - is this cluster of blogs written mostly by women, discussing the scientific life, etc. The degree distribution was much closer to uniform within the cluster, and there were many comment links between all of the nodes. This, to me, indicates other uses for the blogs and perhaps a real community (or Blanchard's virtual settlement).

Also, picked out the troll very easily using the commenter network - so this method could be used to automate troll identification. (in the first study I talked about this guy with a physicist and the physicist basically only reins the troll in when he's so out of bounds as to be gross... so ID-ing a troll doesn't necessarily meaning banning).

I'm quickly running out of steam in this blog post - but this might end up being a pilot for my dissertation, so I'm definitely more than happy to talk about it either in the comments here, or on slideshare, or on friendfeed... or twitter or... just look for cpikas :)

More like this

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Yeah, me too.

August 2, 2010

I'm also leaving ScienceBlogs, but it's not for the reasons some others have given. I don't think Pepsi's blog will hurt my real life reputation and besides, it's been pulled, there have been apologies - it's time to forgive. July was the first month I've gotten enough hits to get a paycheck - and…

Very cool - American Physical Society offers free access to public libraries

July 29, 2010

This APS rocks! Here's the press release from PAMnet: FOR IMMEDIATE RELEASE APS ONLINE JOURNALS AVAILABLE FREE IN U.S. PUBLIC LIBRARIES Ridge, NY, 28 July 2010: The American Physical Society (APS) announces a new public access initiative that will give readers and researchers in public libraries…

Michael Pater, Connecticut artist, died today

July 25, 2010

He was also my husband's uncle. I only found two of his images online, the remainder are photographs of prints we have on our walls - intentionally poor quality for those. He was a member of the Lyme Art Association, so there may be more information on their site. The Courant (Hartford, CT)…

Hey maybe scientists should do more than just wait for their journal to issue a press release on their new fabu article

July 25, 2010

The authors thesis is that the only mandatory communication of results is in peer reviewed journal articles. Scientists aren't required to do other communicating and often leave communication to the public to the media. They ask if is this is adequate given the very low percentage of scientific…

Well, sometimes you just have to Google it

July 21, 2010

So there I was, try all kinds of librarian ninja tricks on the fanciest, most expensive research databases money can buy (SciFinder, Reaxys, Inspec...) and no joy. Couldn't find what I needed. I'm perfectly willing to admit that I don't know all that much chemistry, but usually I do ok since I work…