What is a blogroll? How does one maintain a blog roll? Is there any way to combine the onerous task of blogroll maintenance, the intrigue of JavaScript and the Mysteries of the Linux Command Line to make blog rolls fun?
Of course there is!
What is a blog roll? A blog roll is a list of links that appears on a blog (usually in a side bar, not in a post) whereby each link is to another blog. This is useful for readers; If you like the blog you are reading, there is a reasonable chance that the blogroll contains some other blogs you may like as well. Or, at least, the blogroll on a blog you like is more likely to include matches to your particular tastes than a similar number of randomly chosen blogs.
Some bloggers divide their blog roll into categories. This is a practice found most often among new bloggers. This is so much work (and so difficult given the nature of categorization) that most bloggers who start this practice abandon it eventually
Some blog rolls are static .... they consist of a simple link to five, ten, thirty, whatever, blogs. However, as blog rolls get longer with time (as bloggers wish to add more and more to the blog roll), the blogger must either become increasingly selective ... a practice that some embrace and others abhor ... or they must find a different way.
The most common response to blog roll bloat is to implement a rolling blog roll. There are broadly speaking two different ways to do this. One is to engage the services (sometimes free, sometimes not) of a third party site. You keep your blog roll on a database somewhere on the internet, and that site will generate code that you can use on your site to have a fancy blog roll. For instance, you might be able to have the blog roll items listed by recency of content, so stale blogs are listed at the bottom or not at all.
I experimented with these third party services and found out two things: 1) Most of them do not produce the exact desired result, so one must accept less than ideal results and/or 2) the ones that do allow the most flexibility and best features are not free. By not free, I don't mean expensive. But still...
I use a method of cycling through my blog roll .... currently 30 entries at a time .... that I'll describer below.
What does a blog roll do for the blogger? Well, it allows the blogger a way to give and receive link love. Link love is not a form of on line cybersex. It is in part a replacement for one on one professional contact that occurs in Meatland. Entry on a blog roll is a nice thing to do for someone else. If a blog roll is very short, you are either looking at a new blogger or an asshole. (I'm talking about the total length of blog roll, not the displayed portion). I know of several bloggers who have been blogging for quite some time but have fewer than 40 or so sites on the blog roll, three or four of them being links to the blogger's own sites. Independent evidence suggests in many of thes cases that the blogger is an asshole. The correlation is astoundingly strong.
There are plenty of exceptions, of course. If you are reading this, you are an exception, I assume.
A blog roll is also good for the blogger because it allows the blogger the opportunity to DELETE the entry of a fellow blogger who becomes too annoying. Maybe some of those short blog rolls are just people who are extra sensitive.
What is blog rolling? Blog rolling is a practice that I believe I first observed being used by Coturnix of A Blog Around the Clock. I don't know if he invented it or not, but since much of what I learned about blogging was from Coturnix, I'll credit him with at least a significant role in blog rolling (and more generally, blogging) method and theory.
Blog rolling is when you convert your blog roll into suitable code for presenting it in actual blog posts. This reminds people that you have a blog roll, gives readers a chance to explore a little around your corner of the blogosphere, and so on. One usually does this with part of one's blog at a time, over a number of days.
One of the most important uses of a blog roll that becomes especially important with blog rolling is the effect of such things on blog ranking. Most blog ranking services can easily detect and thus devalue links in blog rolls or blog rolling posts, but the truth is that if I put a link to your blog on my site, you get an increase in techorati ratings (and other ratings) .... and visa versa .... even if the link is in a blog rolling post. The ranking sites may devalue (depending) such links, but these links are not meaningless, so I assume that as ranking sites evolve over time, this is recognized for what it is.
(By the way, many bloggers claim that ranking is not important to them at all. Those would be the bloggers with low ranks.)
How do I roll?
If you look at the left side bar on my site, you'll see a blog roll with about 30 links. If you reload my site you'll see those links change each time. Go ahead. Press reload about 20 times and see the great diversity of sites on my blog roll.
Done? OK, press reload about ten more times then read on.
Obviously, this blog roll serves as a passive suggestion from me to you as to where on the internet you may like to explore. In addition, I use this blog roll when I do "blogospherics." I open up an instance of my site, and right-click : open-in-new-tab several items from my blog roll. I look at them to see if I still want them on my blog roll (and delete them if not, but that is actually rare). If the site has something new and cool or interesting, I'll add that to my blogospherics post.
That list is generated with a javasript code block that includes a database of the blog roll and instructions to select a certain number at random to display.
Following is the code with a subset of the database. This code includes the HTML necessary to format the output as a list in a div.
<div>
<h3>span>Blogroll</span></h3>
<ul class="linkList">
<script type="text/javascript" language="javascript">
//replace href with site urls, linktext with displayed text/flyover title - all in quotes
var sitepicks = new Array(
{href: "http://www.10000birds.com/" , linktext: "10,000 Birds"},
{href: "http://90percenttrue.com/" , linktext: "90% True"},
{href: "http://scienceblogs.com/aardvarchaeology/" , linktext: "Aardvarchaeology"},
{href: "http://scienceblogs.com/zooillogix/" , linktext: "Zooillogix"}
//no trailing comma!!!
);
<!-- embed this where you want the links to appear -->
var site, rnd, HTML = "", numOfSites = 33; //adjust
while (numOfSites--) {
rnd = Math.floor(Math.random() * sitepicks.length);
site = sitepicks[rnd];
HTML += '<li><a class="randlinks" href="' + site.href + '" title="' + site.linktext;
HTML += '">' + site.linktext + '</a></li>';
sitepicks.splice(rnd,1);
}
document.write(HTML);
</script>
</ul>
</div>
Well, that seems like a lot of work .... putting the blog roll into this format. Also, if you are going to do blog rolling, then you also have to have your blog roll in HTML format. That's a lot of work too. What's a blogger to do????
Well, I can show you some code that will do all this work for you. It is not pretty code, it is certainly not the only way to do it.
But, this is the BEST way to do it. Or at least, that's what I think. Any coders out there think you got a better way, let's see it.
What I'm about to show you works in a bash shell on Linux, but it should work in most shells on most *nix machines, meaning you can do this on your Mac easily. As for Windows, I'm sorry that I can't give you details but I think there is a way to run this on Windows as well. You'll have to figure that out on your own.
Here is what I do. First, I keep a file named blogroll.master.txt in a special subdirectory (called blogroll) in which I enter the URL and blog name of any blog I want on my blog roll, separated by tabs. The file looks like this:
10,000 Birds [TAB] http://www.10000birds.com/
90% True[TAB]http://90percenttrue.com/
Aardvarchaeology[TAB]http://scienceblogs.com/aardvarchaeology/
A Blog Around The Clock[TAB]http://scienceblogs.com/clock/
...
I've put [TAB] in there where a tab goes, and the '...' means 'there's plenty more.'
My current blog roll is 295 items long.
I add sites to this file as I encounter them. I do not worry if sites are already on there. Duplicates are, in theory, removed by the method I'm about to describe. However, note that I say "in theory" which in vernacular English means "it is supposed to happen but is far from perfect" ... (which is an example of using the word "theory" in a very different way than it is generally used in science).
Now, the following method is designed to allow for the user to have more involvement than some may like with an 'automated' process. But this is the best way.
OK, here's the code:
#/bin/bash
echo "standby ... generateing blogroll"
echo "seeking blogroll java file (blogroll.from.java), translating to internal format to temp file (blogroll.temp.01)"
# {href: "http://www.10000birds.com/" , linktext: "10,000 Birds"},
sed 's@{href: "\(.*\)" , linktext: "\(.*\)"},@\2\t\1@g' blogroll.from.java > blogroll.temp.01
echo " "
echo " "
echo " "
echo "seeking blogroll.master.txt appending to temp file (blogroll.temp.01)"
cat blogroll.master.txt >> blogroll.temp.01
echo "sorting blogroll.temp.01 with unique values to temp file (blogroll.temp.02)"
sort -u blogroll.temp.01 > blogroll.temp.02
echo " "
echo " "
echo " "
echo " PLEASE INSPECT THE FILE AND THEN PROCESS WIRTH generate_java_blogroll"
gedit blogroll.temp.02
To make this work, you first open up the java script file (shown above) and copy all the java script style sites into a new file you call "blogroll.from.java' ... and add that comma back to the last item. Then you run this program, which I call "generate_blogroll'
What does all that mean? Let's break it down.
We start with a shebang line to tell the computer this is a bash command script.
The echo commands tell the user what is going on and at the same time serve as documentation and debugging. I'm doing this here because, despite my (obvious, yes?) joking about how this is the best way to do it, the truth is I just slapped this script together and I have very little confidence in it. You knew that, right?
The first true comment (comments start with '#') is a line from the java script file you made indicating the layout of each line for reference, so that you can understand and debug the script more easily.
The next line, starting with "sed" is a sed command (sed is a programming language accessible as one liners from bash).
In this sed line, "@" is a delimiter. The action happens within the single quotes, and it happens to the contents of the file named after the second quote. The output of the action is sent (via the greater than sign) to the file named at the end. So, this is taking blogroll.from.java and messing with it, and putting the messed-with contents into blogroll.temp.01.
The way sed works is that is has an instruction (what to do with the following) followed by two sets of text using regular experessions. In this case the instruction is "s" for "substitute". So, the first set of text if found in a file is replaced with the second. The "g" at the end is not important for present purposes.
Here is a pseudocode way of representing this sed command:
sed
'SUBSTITUTE @ INVARIANT-TEXT (VARIABLE-TEXT) INVARIANT-TEXT (VARIABLE-TEXT) INVARIANT-TEXT @ VARIABLE-2 [TAB] VARIABLE 1@ EVERYWHERE'
USING THE FILE: blogroll.from.java
SEND THE OUTPUT -> blogroll.temp.01
The key, cool part here is that in sed, parens '()' delimit bits of text that will be a variable, with the first instance being the first variable (known as '1') the second being the second variable (known as '2') etc. The use of the '\' tells sed "the character after the slash is a code for you to use, not text in the file you are eating."
So what this does, in essence, is to take the name of the site and the URL of the site, extract it from the java script data, and make a new temporary file with the same format as blogroll.master.txt.
The next bit (cat blogroll.master.txt etc.) simply reads (using cat) the master blogroll you have been maintaining and appends it (using the double greater than sign) to the temp file you've just created.
But "why, why so complex a thing?" you say!
Well, I find myself wanting to add to my blog roll when I'm not at the computer on which I keep the master list. So, I just go into the java code on the moveable type site at Science Blogs dot com and add it right to the code. So over time there are new sites added to both files. This combines the files.
Of course, this also means that there are many duplicates in the file, including those that were on the master list and in the java code, as well as any I've 'added' to my blogroll that may have already been there or added previously. But the next step fixes this.
Using the same method of dumping results into temporary files, the next relevant line uses the Unix 'sort' command to sort the blogroll. With the '-u' option turned on, duplicates are removed ('u' = 'unique'). Now, if a site is in my database with two URLs one using 'www' and the other not, then that duplicate is not removed. Or, if one site is in my database with slightly different names, the duplicate is not removed.
This is the reason for the last echo statement followed by the last line of the code. Tell the user to look at this file, and then display the file in my text editor (which happens to be called 'gedit.'
So I can look through this file and remove bad sites ... sites that are not formatted properly or that are duplicates (as described above) or whatever.
The echo statement instructs the user to run "genreate_java_blogroll" which is another script. It looks like this:
echo "seeking blogroll.temp.02 converting to java format to temp file (blogroll.temp.03)"
# {href: "http://scienceblogs.com/intersection/" , linktext: "The Intersection"},
sed 's@\(.*\)\t\(.*\)@{href: "\2" , linktext: "\1"},@g' blogroll.temp.02 > blogroll.temp.03
echo "creating blogroll.java.final ... adding header to this file"
cat blogroll.java.header > blogroll.java.final
echo "appending java from blogroll.temp.03 to blogroll.java.final"
cat blogroll.temp.03 >> blogroll.java.final
echo "deleting extranious final comma from blogroll.java.final"
echo "HERE YOU DO IT, MAN"
echo "appendinding footer to blogroll.java.final"
cat blogroll.java.footer >> blogroll.java.final
echo "generating html for blogrolling"
echo "opening blogroll.java.final in Gedit for your convenience"... saving master blog roll
cat blogroll.temp.02 > blogroll.master.txt
sed 's@\(.*\)\t\(.*\)@\1@g' blogroll.temp.02 > blogroll.html
gedit blogroll.java.final
gedit blogroll.html
gedit blogroll.master.txt
The first thing this code does is to convert the contents of the temp file previously created (which looks like the master blogroll in format) to the java script style.
I have a file in my blog roll subdirectory that contains the top part of the JavaScript code I showed you above, and another file that contains the bottom part. The next thing this script does is to use the cat command to first write (and overwrite) the header of the JavaScript to a file called blogroll.jafva.final. Since we are using the single greater than sign ('>') the file, if it already existed, is clobbered. (If you have "no clobber" turned on as an option on your computer, you are a wuss, and you must unset that option).
Then the contents of the temp file is appended (using the double greater than signs) to the java file, then the footer of the java file is stuck on to the end.
Note that the script claims that it will delete the trailing comma from the list of sites. JavaScript inconveniently cares about this comma (bad JavaScript!). However, this is actually not incredibly easy to program, so the script reconsiders and in all caps tells the user to do it.
The file is opened for the user to fix and inspect.
Then the script converts, using another sed command, the pre-java version of the blogroll inoto an HTML format. Chunks of this can be copied into blog posts for blogrolling purposes.
This is also opened in the editor for the purpsose of inspection and use.
Finally, the temp file which is the most current, most cleaned up version of the blog roll is copied over the master blog roll. However, the user must re-do this if the temp file is fixed.
Note that my most recent (as of this writing) version of bog rolling lists the links in posts separated by space-tilde-space delimiters. I'm thinking this will be more eye-appealing. I could program this into the script shown above if I end up liking the look. For now I just use a simple search and replace in gedit, changing all instances of \n to ' ~ '.








Comments
Of course, I do it all manually. My "Blogrolling for today" posts usually highlight the newest blogs that I still need to add to my big Blogroll, i.e., these are "the newest additions" in most cases. Which reminds me that I am woefuly behind actually adding all those to the Blogroll.
Posted by: Coturnix | July 21, 2008 12:56 PM
"(By the way, many bloggers claim that ranking is not important to them at all. Those would be the bloggers with low ranks.)"
Or those would be bloggers like me, who have been around for over two years and still don't seem to merit much in the way of an official rank or "authority" on Technorati, despite oodles of link, regular posting,s and a healthy constant stream of traffic. We have learned NOT to care, because Technorati clearly does not care about us. Not that we're bitter. :)
Seriously, after awhile, you learn that the metric used to build these arbitrary rankings isn't all that reliable... I'm not saying it's meaningless -- just that it's not really clear what it's supposed to mean.
Posted by: Jennifer Ouellette | July 21, 2008 9:05 PM
Jennifer: I just bought your book, as a present for my daughter. I hope that works out .... (if not I'll absorb it happily into my collection)....
You have to get on Technoarati if you want them to be nice to your blog.
Posted by: Greg Laden | July 21, 2008 9:23 PM
...and all that may work perfectly well for bloggers who are not hosted at wordpress.com.
Bastards don't allow javascript.
:(
Someday, I'll get my own hosting and domain, and I'll be adding something like this for certain, so thank you. I'll be keeping this in my back pocket.
Posted by: JanieBelle | July 24, 2008 11:24 AM