I know all my fellow bloggers are jealous about my Linux calendar posts (like this one) and normally I don’t reveal my secrets. But this is so cool I have to share it.

The Linux calendar command (in the terminal window) puts out, by default, a listing of events, etc. from today and tomorrow. The listings come from “calendar” files that come with your Linux distribution (these are also on most Unix systems). You can also make your own calendar files. (I’m working on that now.)

But how do you take the output from Linux Calendar and turn it into a blog post?

Well, you can type “calendar” into the Linux terminal, copy and paste the output, then manually insert the HTML code and put that on your blog.

Or, you can do the same thing but use a search and replace function to make the manual insertion of code easier.

Or, you can use a simple bash shell script, which I will be happy to explain to you now.

I’m trying to convert each calendar line, which is simple text, into a particular HTML format. I want to go from this:

Dec 08 Mother’s Day in Panama

to this:

<DL>
<DT>Dec 08 </DT>
<DD>Mother’s Day in Panama</DD>

</DL>

The “DL” tags define the totality of a “definition list” which is a kind of HTML list. Each element in the definition list has two parts, a “term” defined by the “DT” tags and a definition, defined by the “DD” tags.

The script I do this conversion with is a bash script that invokes a perl one-liner and some other code, uses piping and redirection, and within the perl one-liner, a substitution type regular expression. At the end, I will give you a list of the books you need to read to really understand this.

Here is the script:

#!/bin/bash

echo ‘<DL>’ > calendar.html
calendar | perl -pe _
‘s/^(……..)(.*\n)/<dt>$1<\/dt>\n<dd>$2<\/dd>\n/g’_
>> calendar.html
echo ‘</DL>’ >> calendar.html

What is this gobbldygook?

First, you have to know that the “_” symbol simply means “this line of code is wrapped onto the next line” … the “_” is NOT part of the code.

The first line (‘#!/bin/bash’) is always the first line in a Linux shell script that uses the “bash” shell.

The beginning and end “echo” statements send information (using ‘>’ or ‘>>’) to a file. A single ‘>’ sends the text to the file and destroys whatever was in the file to begin with. If the file did not exist, it is created. Two arrow-thingies (‘>>’) append the text to the end of the file without destroying it.

So the first echo statement makes a file that has the first HTML tag to define a “definition list”. The second echo statement, at the end of the script, appends the closing tag for a definition list to the end of the file.

The stuff in the middle is a little more complicated.

This part: ‘calendar | blablabla’ … invokes the calendar command and sends its output to blablabla, using a “pipe” (‘|’) rather than to where it would normally go, which would be the terminal screen. So whatever output would spew forth by typing “calendar” into the terminal is sent to “blablabla” as though it was being typed in or read from a file.

perl -pe is the opening of a one-line pearl command. The word perl invokes the perl interpreter (perl is a programming language). The ‘e’ parameter tells the perl that it is to run the following stuff, coming up soon in single quotes, don’t ask any questions, just do it. The ‘p’ parameter tells perl to do whatever is in this one liner on every line in whatever input it gets given to it.

The input, of course, will be the output of ‘calendar’ coming from the pipe (‘|’).

So, we have:

perl -pe ‘blablabla’ >> calendar.html

in the middle of this gobblygook, which is going to do something to the output from calendar and append it to the file calendar.html.

The blablabla part is:

‘s/^(……..)(.*\n)/<dt>$1<\/dt>\n<dd>$2<\/dd>\n/g’

This is a perl regular expression, and may require a certain amount of explanation.

A perl regular expression can take this form:

s/blablabla/foobar/g

The ‘s’ means that this is a substitution. blablabla is going to be substituted with fobar. The ‘g’ means that this is going to happen every time it is possible to happen. If you don’t put in the ‘g’ then the command will only work on the first instance it encounters. (Technically, you don’t need the ‘g’ here because there is only one instance per line for our particular problem, and the ‘p’ makes this work on each line, but I put it there anyway. Its just my programming style, man.)

So now you can see that the gobblygook has two parts: blablabla and foobar. Perl is looking for blablabla, and every time it finds that, it is going to replace it with fobar. So for instance, if we have this text:

“The creationist perspective differs from that of the evolutionist”

and we run this perl regular expression substitution command on it:

s/reation/Intelligent Design Proponent/g

… we would get this result:

“The cIntelligent Design Proponentist perspective differs from that of the evolutionist”

Hmmm….

OK, so here is the output of one line of calendar on my Linux machine:

“Dec 08 First Ph.D. awarded by Computer Science Dept, Univ. of Penna, 1965″

(You can’t see it, but there is a carriage return at the end of that line.)

And here is the regular expression I used:

s/^(……..)(.*\n)/<dt>$1<\/dt>\n<dd>$2<\/dd>\n/g

So, I’m looking for this:

^(……..)(.*\n)

and replacing it with this:

<dt>$1<\/dt>\n<dd>$2<\/dd>\n

The thing I’m looking for actually matches the entire line of output from the calendar program. Lines of output from the calendar program always start with three letters indicating the month, a space, two digits indicating the day, and then two spaces. (sometimes the second space is an ampersand for some reason, but that does not matter.) In a perl regular expression, a dot (‘.’) equals one thing, a letter, number, space, whatever, but just one of them. So …….. will match, for instance, ‘Dec 08 ‘ or ‘Jan 21 ‘ or whatever. Anything that is eight thingies long.

The caret (‘^”) says “This has to be at the beginning of the line.” Therefore, ‘^(……..)’ simply matches the first eight letters/numbers/spaces at the beginning of the line.

Now, here is the kick-ass cool part. Since this bunch of dots is in parenthesis, once it is found, the match itself (like, for instance, ‘Dec 08 ‘) will be stored in a special variable called ‘$1′ … whatever is in parentheses gets stored in a variable. How cool is that?

$1 can then be used in the second part of the regular expression. So, for instance, if we had this text:

“many”

and ran this regular expression on it:

s/(many)/$1$1$1/

we would get:

manymanymany

The second half of the first part of our expression looks like this:

(.*\n)

Again, the parentheses. So, whatever is matched by a dot followed by an ampersand followed by a slash-n will be stored in variable $2. The first thing in parens gets stored in variable $1, the second in variable $2, and so on.

A dot is any letter, number, or space, as you know. A star following anything means “any number, including zero, of whatever was just before me” (that’s the star talking). So, dot-star means pretty much whatever is there. The slash-n means the “newline” … the carriage return at the end of the line where you hit the “enter” key.

So, the first part of the calendar output is “eaten” by the first thing in parens, and the second part … everything not including that first, already consumed part, up to and including the newline (carriage return) at the end, is stored in the second part. So,

$1 = ‘Dec 08 ‘
$2 = ‘ First Ph.D. awarded by Computer Science Dept, Univ. of Penna, 1965\n”

The second part of the regular expression looks like this:

<dt>$1<\/dt>\n<dd>$2<\/dd>\n

Reading from left to right this means:

The opening HTML tag for “definition term” [followed by] whatever is in variable $1 [followed by] the closing HTML tag for “definition term” [followed by] a carriage return [followed by] the opening HTML tag for definition definition (the actual definition) [followed by] whatever is in variable $2 [followed by] the closing tag, [followed by] a newline.

Now, lets say we named the script “calendar2html.sh” … To run it, if you are in the same directory as the script, you can type “bash calendar2html.sh”. But if you want to have this become part of your regular set of available commands, you will need to put the file in a directory that is on your path, and to change (chmod) the script file into an “executable” file. I’m not going to go into that stuff right now.

Also, my script always makes a file called “calendar.html” rather than allowing you to specify some other name. I find that very convenient and see no reason to complicate matters. But others will prefer to complicate matters.

There is more than one way to do this. For instance, there is another, more elegant way to specify “eight whatevers.” Also, the newlines that end up being in the code at the end do not really need to be there. all the HTML code in this case can be all smushed up, no problem. Also, I could have used a regular expression to put the definition lists opening and closing tags around the whole thing, rather than the echo statements. But then, we wouldn’t get to talk about redirection that appends vs. overwrites to a file.

Whenever I read this kind of thing written by someone else on the web, there is always a disclaimer. Obviously, if you try this on your own computer, and your computer blows up, then you’re some kind of chump for doing what I said to do, OK?

Also, this is all being done on a Linux computer, but the basic idea will work on any computer with perl, and that could in theory be a Mac or a Windows computer. However, I think the calendar program is confined to the linux/unix system.

Gentlemen/women, start your terminals….


Sources:

Classic Shell Scripting

Mastering Regular Expressions

Not necessary for this level of scripting, but useful:

Programming Perl (3rd Edition)

Comments

  1. #1 Rosie Redfield
    December 8, 2007

    That’s a lot of how, but why?

  2. #2 Doug Alder
    December 8, 2007

    Rosie

    Because he can :)

    Greg – next step is to automate the importation and posting of that file into your blog and set up a cron job to do it at regular intervals :D – hands free blogging LOL

  3. #3 Greg Laden
    December 8, 2007

    Rosie: Yea, what he said.

    Actually, this took me a half hour to make work, and an hour to write up the post. So I don’t count the post.

    That half hour will save me a minute for every post, but more importantly, I’ve learned something that I can use for other automation effects. So in ten or twenty months I’ll start to break even…

    Doug … Exactly. I have a Python script hat will load text onto the movable type engine, so a chron job at one end and python on the other… I’ll provide the option to calendar to only give one day of data instead of two, then “Greg Laden’s Blog” will be the GOTO BLOG for what’s happenin’.

Current ye@r *