Last set of comps readings, I talked about sense of community:

belonging, having influence, fulfillment of needs, and emotional

support. Now, let’s talk about the physics version of

“community” – cohesive subgroups. In a graph, these are

groups of nodes in a graph that are more connected to each other than

to other parts of the graph. Clumpy spots. If you read old
href="http://www.worldcat.org/oclc/30594217">Wasserman and

Faust, you’ll probably think of cliques, cores, and lambda

sets… some how these didn’t do it for me – literally, when I was

trying to
href="http://terpconnect.umd.edu/%7Ecpikas/ScienceBlogging/PikasEScience08.pdf">locate

communities in science blog networks, it didn’t work..

If you have a computer science or maybe even sociology

background you’ll probably

just look at some sort of clustering (agglomerative or divisive).

The hot thing for the

past few years comes from physicists and that’s what’s covered here.

I did other posts on SNA

articles, so those are mostly

elsewhere. (BTW – if you ever take stats for the social sciences and

can substitute R for stata, do so and take the time to learn it. The

igraph package for R has all of the coolest community detection

thingies in it) (note, too, that these readings are not necessarily for

the dabbler in bibliometrics or social network analysis!)

Newman, M. E. J., & Girvan, M. (2004). Finding and evaluating

community structure in networks. Physical Review E (Statistical,

Nonlinear, and Soft Matter Physics), 69(2), 26113-21.

(just go here)

This article, like the ones from Barabasi, sort of kicked off this

flurry of research. They use a divisive clustering technique

– so they start with the whole network, and break the connections with

the highest betweeness. See figure.

See how if you remove

that one line, how you completely break up the thing? That line has

high betweenness. So they calculate that for all of the lines using

whatever method, then take the line with the highest out, then

re-calculate and remove, and again. They then go on to talk about the

actual algorithm to use to efficiently do all of this betweenness

calculating and give some examples. There’s a lot in this

article, though, because they next talk about how to figure out when

you’re done and if you’ve got decent communities. This measure is

modularity (see the article for the definition), but basically it’s 0

if random and 1 is the maximum. If you calculate Q at each step, then

you can stop when it’s highest. Note that any given node can only be in

one community, unfortunately. (in real life, people are nearly always

in multiple communities)

Reichardt, J., & Bornholdt, S. (2006). When are networks truly

modular? Physica D, 224(1-2), 20-26. doi: 10.1016/j.physd.2006.09.009

(or look here)

They review Newman and Girvan and suggest a new way that groups

connected nodes and separates non-connected

nodes. They go through a process and end up

with an equation that’s apparently like a Hamiltonian

for a q-state Potts spin glass (dunno, ask a physicist if you need more

info on that!). This method allows for overlapping

communities because there could be times when you could move a node

from one community to the next without increasing the energy.

They compared it for some standard graphs and it did better

than N-G. Instead of just stopping by minimizing modularity, they

compare the modularity to a random graph with the same degree

distribution.

Reichardt, J., & Bornholdt, S. (2007). Clustering of sparse

data via network communities-a prototype study of a large online

market. Journal of Statistical Mechanics: Theory and

Experiment, P06016. doi:10.1088/1742-5468/2007/06/P06016

In this one they test the spin glass community detection method against

the German version of ebay to look for market segmentation. The network

has bidders as nodes, and if they bid on the same item there is an

edge. The spin glass method was successful at pulling out

clusters and using odds ratios, the authors showed that these clusters

corresponded to groupings of subject categories. The Q was much higher

than it would be for a random graph.