Ok, so this isn’t really physics as such, but it’s pretty fascinating. There’s a very large online community called Reddit in which users submit links which interest them. These links come with two little arrows beside them, and the users can vote the link up or down. Here’s a screenshot of how the website looks to me at the time of this writing:
As I visit on different days or on different times on the same day, the links and their order changes. This keeps the site fresh and news-y, at least if you like your news full of cat memes. It’s pretty clear that the ordering of these links is both a function of when they were submitted by the users and of the votes they receive, but how exactly does this work?
The algorithm itself is explained in this very informative post by Amir Salihefendic. In short, every post is assigned a number given by the function:
Here n is the net number of upvotes. For 10 votes up and 0 votes down, n = 10. For 50 votes up and 40 votes down, n also equals 10. Next, t is the time in seconds after an arbitrary moment that happens to be in 2005. The choice of that arbitrary moment doesn’t matter – what matters are the differences in scores. This function f(n, t) is calculated for each link, and they are sorted in order from the greatest to the least value of f. I’ve slightly simplified the equation by dropping a coefficient that makes no difference for positive n.
Ok, great. Now what does this all mean? Amir’s post gives some examples, but I want to dig a little bit into the interpretation of this equation. In physics it’s very often the case that an equation isn’t just some abstract mathematical machine, but rather it’s a natural statement which has an intuitive interpretation we can understand. For instance, is an abstract vector calculus statement, but physicists see that equation and understand it intuitively as the idea that electric field lines diverge outward from sources of electric charge. That’s a more useful way of thinking of it than “Ok, now we have to solve some horrible partial differential equation before we can know anything at all.” Intuition gives us a qualitative picture, and from there we can do the hard work to get a numerical answer when required.
Since Reddit’s equation is just used to generate an ordering, an overall multiplicative factor doesn’t matter. If a score of 20 is ranked ahead of 15, then 200 will be ranked ahead of 150. So let’s multiply Reddit’s equation by 45000 seconds.
Effectively this just means the posts are sorted in order by t, the time they were posted. Newer posts are higher. But there’s that log(n) term – it moves the posts forward in time. Newer posts are listed first, and a post becomes even newer by getting votes. If n = 10, then log(10) = 1 and the post is moved forward 45000 seconds, or 12.5 hours. If n = 100, then log(100) = 2 and the post is moved forward 90000 seconds, or 25 hours. We can plot this for more and more net upvotes:
The returns are diminishing. Logarithms are slowly increasing functions, so each additional upvote moves the post forward in time by a smaller and smaller amount. Even with thousands of votes, a post has only moved about two days into the future, which is why posts never last more than a day or so on the front page. After that it gets overtaken by any new posts, even ones with few upvotes.
In politics we often hear that every vote counts. In Reddit, we can actually figure out how much each vote counts. If I upvote or downvote a post, how far does my individual vote move that post in time? For large n, it’s a very accurate to approximate the change in log(n) (for each additional vote) by its derivative:
Well that 0.434 is a little annoying but hey, I didn’t chose to use base 10 logarithms. (Had they used base e = 2.718… then it would just be 1/n.) What this means is that if a post has 10 votes, your upvote will add about 45000*0.434/10 = 1737.2 seconds, or about 29 minutes. A downvote would move it backwards by that same amount. If a post has 50 votes, your upvote (or downvote) will move it forward or backward by about 5.7 minutes. For a 4700 vote post like one of the ones in the screenshot above, each vote makes a mere 3.7 seconds difference.
This might suggest an improvement on the “subscribe” and “unsuscribe” system – if there’s a subreddit you’re interested in but not that interested in (/r/aww maybe?), you could give it a handicap by having Reddit subtract (say) a 6 hour penalty on every post from that subreddit. This would require a /r/aww post to get about 3 times as many votes to overtake an unpenalized post which was originally made at the same time. (Homework: given a h hour penalty, how many times more votes does the penalized post require to overtake a simultaneously-posted unpenalized post?) Correspondingly, you could give a bonus for subeddits you want to see more of. Unfortunately this is probably not a feasible suggestion. Separately sorting huge lists for millions of users would probably melt the servers. But it would be a nice feature.
All right, better wrap this one up. As far as user-vote-based ranking goes, Reddit’s is unusually interesting from a mathematical standpoint. For what it’s worth, I give it my upvote.
[Update: Fixed a mistake in the calculations.]