Using Twitter to probe audience share of microbloggers

A conversation cropped up on Twitter the other day about shared audiences. Specifically, Ed Yong and Alice Bell used this tool to compare the overlap in their followers.

So we science nerds wondered, how does that overlap look when you start adding in more bloggers? What is the shared audience between five, 10, 20 of the most prominent writers? This is very interesting to me, because I suspect that, even within a portal like ScienceBlogs, there is in fact very little sharing of audiences. Perhaps that's a reflection of the number of blogs people can reasonably follow. Maybe it's the long tail effect of having 80 blogs where interests don't overlap as much as you'd expect. Measuring this share on ScienceBlogs would be tricky, measuring it on science blogs across the world would be very very hard. But luckily on Twitter, the data on who is following who is freely available! It's basically a big set of subscriber lists that we can compare.

My first instinct was to Venn the shit out of that, but the problem is, Venn diagrams are lovely for two dimensions, and workable at three. But the number of different overlapping areas needed grows exponentially as you increase the number of sets you are comparing. Just look at this 6 dimension Venn!

i-76cd29a54a471920198e555a0d401180-548px-Edwards-Venn-six.svg.png

So how best to show the overlap amongst 10 people on Twitter? I sketched a lot of charts, and none seemed right, until I realised that instead of focussing on bloggers, I should focus on the audience. Good advice for any day, really.

Instead of asking, who do I share?, we should ask: who are the power users, who is following many writers? Who is following just a few? In this way, I imagined pooling all the followers of several writers on Twitter. I than imagined assigning them a value based on the number of writers in my set that they follow. Finally, the followers would be grouped according to their score. A bit like this:

i-9e689e7fd5247bd6f12f1a3cdfa41c21-IMG00017-20100720-1218.jpg

Here the people in the top left follow six out of six writers that I am comparing. The next bracket are all those who follow five - any five - of the six writers I am comparing. And so on. The relative size of the brackets will tell us the overlap. What it won't immediately tell us is the quality of the overlap (i.e. which combination of bloggers they follow). But this could be added in as shading, or as a mouseover text. The final frame would show the unique audience, who only follow one of the writers each, and it would be easy and worthwhile to separate these into groups. It would also be worthwhile to add the functionality to highlight a single writer's followers across all the groups, making comparisons bewteen two or three writers within the dataset easy.

Why is this important? Well, it tells us several things. One, if there is a large amount of overlap in my audience and Ed Yong's, there's little value in me retweeting something he's already said. (As a follower, I already exploit this, and will happily not follow certain people because I know that anything they say of interest to me will be retweeted by someone I DO follow - in effect, treating the middle man as an editor). You might also use this tool in deciding who to work with to further collective action - for example, putting both Martin Robbins and Ed Yong on a panel together might not double the draw of your event if they appeal to a very similar audience. Not that both aren't worth seeing in their own right! But I'd guess there's a law of diminishing returns at work to how many people you're reaching for the amount of investment you're spending. To really reach out, we need to pair people with little overlap and large numbers of unique followers, else we risk preaching to the choir. What else could we do with this info? And will someone make the tool a reality (maybe it already exists)?

More like this

Nice.

I think all of this is just a probe though, and as you say useful in giving some sense of what's out there. A lot of blog readers aren't on twitter, and many will do what you do (i.e. follow people who RT the famous people). Also, fact that lots of people follow both Martin and Ed is a sign, surely that they provide different content, as well as appealing to a similar demographic.

Sorry, that's just adding note of context, which you already flag up. As I say, nice.

RE: shared audience as a proof two writers provide something different - that's a good point, I didn't consider that.

This is one of those things that starts off simple, and then makes your head spin around and around...

I think first of all that while I like your idea in theory, I think in practice the shading you talk about is going to get trick quite fast, with hundreds of permutations and no obvious model of what similarity of shade or hue would mean.

Also great comment from Alice. I predict that if you could plot of graph of 'similarity' vs. mutual followers, you would find a hump in the graph at some optimum position, such that as similarity rises so does the number of mutual followers, until at some optimum similarity the number of followers peaks, then declines. Let's call it Robbins' Law!

Very cool.
As a new follower of scienceblogs (and possibly one of the younger readers? YOB=1989) I have to say that the "last 24 hours" tab has a strong influence on what I am reading rather than a particular author. I do not know if this is the same for more established readers, but certainly the quality and quantity of threads posted each day affect what I will read.

I like this in principle, but I wouldn't regard my Twitter followers as my "audience". My audience are my blog readers. I use Twitter for engaging with other professionals within the science writing/blogging/comms community as much as with a broader audience. I use my blog to engage with the latter.

Yes, I've thrown the net a bit wide here as it was a quick post and I didn't think too much. Measuring Twitter followers isn't a great reflection of a blog, and Twitter can be used not as a microblog but a networking tool. I make no apologies though, it's just a thought experiment!

As for Robbins Law - there's a slight hitch in that it predicts % shared followers using a metric of similarity... ...which is measured by % of shared followers! Haha, but if you found another way of measuring similarity you'd be on to something...

Also - yes, using shading to denote groups will only work as far as about seven shades, which happens to be the number needed to compare three people. Not a lot.

Frank said: "As for Robbins Law - there's a slight hitch in that it predicts % shared followers using a metric of similarity... ...which is measured by % of shared followers! Haha, but if you found another way of measuring similarity you'd be on to something..."

Already thought of that - my thought was to measure three things - the number of RTs between the two, the number of links that they've both posted, and you could probably also do a comparison of shared key words. The advantage is that you could automate the analysis of all three of those metrics.

I feel I should say I only know the http://twtrfrnd.com/ because Scott Keir was using it to work out who the scicom_bot is.

Unless you can work that out (and the identity of yakawow) your tools are useless. Useless, I tell you :)

http://tweepdiff.com will let you compare the followers/following of more than two users. Performance will degrade with a large number of people with a lot of followers but it will show you the overlap. Not too visually interesting, but effective.

I played around with graphically showing overlap but it fell over with people that had more than a few hundred followers to I put it on the backburner. I was using this tool: http://birdeye.googlecode.com/svn/trunk/ravis/RaVisExamples/example-bin….

Brian