Now on ScienceBlogs: HeartlandGate: Anti-Science Institute's Insider Reveals Secrets

ScienceBlogs Book Club: Inside the Outbreaks

Christina's LIS Rant

This is my blog on library and information science.

Profile

Christina Pikas Christina K. Pikas is a science and engineering librarian in a special library as well as a doctoral student in information studies.
Any opinions expressed here may not even be her own and certainly do not represent those of any organization willing to be affiliated with her.

Search

Recent Posts

Recent Comments

Archives

Geography

Locations of visitors to this page

Where am I?

N 39 W 76

Research Blogging Awards 2010 Finalist

License

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0.

« Hands of Lead and Like to Help People? | Main | It's, um, fine... I guess »

Comps readings: community detection

Category: Information Sciencecompsonline communities
Posted on: June 2, 2009 9:08 PM, by Christina Pikas

Last set of comps readings, I talked about sense of community: belonging, having influence, fulfillment of needs, and emotional support.  Now, let's talk about the physics version of "community" - cohesive subgroups.  In a graph, these are groups of nodes in a graph that are more connected to each other than to other parts of the graph. Clumpy spots.  If you read old Wasserman and Faust, you'll probably think of cliques, cores, and lambda sets... some how these didn't do it for me - literally, when I was trying to locate communities in science blog networks, it didn't work..  If you have a computer science or maybe even sociology background you'll probably just look at some sort of clustering (agglomerative or divisive).  The hot thing for the past few years comes from physicists and that's what's covered here.  I did other posts on SNA articles, so those are mostly elsewhere. (BTW - if you ever take stats for the social sciences and can substitute R for stata, do so and take the time to learn it. The igraph package for R has all of the coolest community detection thingies in it) (note, too, that these readings are not necessarily for the dabbler in bibliometrics or social network analysis!)

Newman, M. E. J., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E (Statistical, Nonlinear, and Soft Matter Physics), 69(2), 26113-21. (just go here)
This article, like the ones from Barabasi, sort of kicked off this flurry of research.  They use a divisive clustering technique - so they start with the whole network, and break the connections with the highest betweeness.  See figure. bowtie.png See how if you remove that one line, how you completely break up the thing? That line has high betweenness. So they calculate that for all of the lines using whatever method, then take the line with the highest out, then re-calculate and remove, and again. They then go on to talk about the actual algorithm to use to efficiently do all of this betweenness calculating and give some examples.  There's a lot in this article, though, because they next talk about how to figure out when you're done and if you've got decent communities. This measure is modularity (see the article for the definition), but basically it's 0 if random and 1 is the maximum. If you calculate Q at each step, then you can stop when it's highest. Note that any given node can only be in one community, unfortunately. (in real life, people are nearly always in multiple communities)

Reichardt, J., & Bornholdt, S. (2006). When are networks truly modular? Physica D, 224(1-2), 20-26. doi: 10.1016/j.physd.2006.09.009 (or look here)
They review Newman and Girvan and suggest a new way that groups connected nodes and separates non-connected nodes.  They go through a process and end up with an equation that's apparently like a Hamiltonian for a q-state Potts spin glass (dunno, ask a physicist if you need more info on that!).  This method allows for overlapping communities because there could be times when you could move a node from one community to the next without increasing the energy.  They compared it for some standard graphs and it did better than N-G. Instead of just stopping by minimizing modularity, they compare the modularity to a random graph with the same degree distribution.

Reichardt, J., & Bornholdt, S. (2007). Clustering of sparse data via network communities-a prototype study of a large online market. Journal of Statistical Mechanics: Theory and Experiment, P06016. doi:10.1088/1742-5468/2007/06/P06016
In this one they test the spin glass community detection method against the German version of ebay to look for market segmentation. The network has bidders as nodes, and if they bid on the same item there is an edge.  The spin glass method was successful at pulling out clusters and using odds ratios, the authors showed that these clusters corresponded to groupings of subject categories. The Q was much higher than it would be for a random graph.
Share on Facebook
Share on StumbleUpon
Share on Facebook
Find more posts in: Information Science

TrackBacks

TrackBack URL for this entry: http://scienceblogs.com/mt/pings/111498

ScienceBlogs

Search ScienceBlogs:

Go to:

Advertisement
Follow ScienceBlogs on Twitter

© 2006-2011 ScienceBlogs LLC. ScienceBlogs is a registered trademark of ScienceBlogs LLC. All rights reserved.