Taming Twitter with the Command Line

I thought I was done with the command line for the week, but then I did something cool that I thought I'd share with you. Linux users only ... others will think this is silly ... join me below the fold.

i-f30dea614b07d492a8c391a4cdad55a3-a-tweetoutput.jpg

OK, are we alone? Good. It's nice to be away from all those Windows Symps for a while. Oh, I see there's a couple of Mac OSX users lurking in the back of the room. Come right over and join us, this will probably work for you too!.

Here is my problem. I use twitter to promote my blog, and I therefore follow almost 3,000 people on twitter. That means that every few seconds there is an update on my "people I follow" list, and it is almost always something I am not even slightly interested in.

This makes Twitter kind of useless for me. Just the other day, I did something that a lot of people apparently found terribly objectionable (I have no idea what it was) and later got an email from a friend that said "Hey, did you see what so and so and so and so and so and so said about you on Twitter?"

"Of course not" was my reply. "I can't use Twitter. Too much data, not enough good data. If something happens on Twitter there is no way for me to know it."

Then, yesterday, I heard: "Hey, did you hear that so and so and so and so are getting married? It was the talk on Twittter!"

"Of course not" was my reply... and so on and so forth, you get the point.

I know there are many possible solutions to this problem (feel free to mention them in the comments if you like) but having a second Twitter account (one of the obvious solutions) with my actual 'friends' on it was not one I wanted to do. Instead I wanted to filter the data.

Yes, there are some software solutions out there to filter data, and maybe some day I'll use one of them, but I really felt that this was a case of too much mucking around for a simple solution. What I really wanted to do was no more than this:

  1. Produce a file of recent tweets by those I "follow."
  2. Manually maintain a file of the names of people whose tweets I want to actually see. An a-list, if you will.
  3. Write a script that would use these two data sources to come up with a list of tweets by the smaller list of people, culled from the larger fire hose list.
  4. Format that list minimally so it shows up in a web browser as a local page.
  5. Put a button on my Gnome toolbar that makes all this happen.

I call it a-tweet. Example results are depicted in the graphic above, and the script looks like this:

#! /bin/bash

twidge lsrecent -lsu > ~/twidge_data/data

awk 'NR==FNR {u[$1];next} ($2 in u)' ~/twidge_data/a-list ~/twidge_data/data > ~/twidge_data/a-tweets

cut -f2,4 ~/twidge_data/a-tweets |

sed 's/^\(.*\)\t\(.*\)/<DT>\1<\/DT><DD>\2<<\/DD><\/br><\/br>/g

> ~/twidge_data/a-tweet.html

firefox ~/twidge_data/a-tweet.html

This script is not all squishy and one-liney like it could be. I use intermediate files and drawn out constructs so that I can later mess with it more easily. This also makes it easier to explain.

The first line uses Twidge. Twidge is a great find. sudo apt-get install twidge will add it to your system.

After you install twidge you will run

twidge setup

which will ask you for your user name and password for the account you want to twidge around with.

Twidge lets you do cool things with Twitter on the command line. Learn about it here. Among other things, you can post tweets from the command line, or get a list of your followers or followees, and so on. I found it by just searching for "twitter" in my package manager.

So let's break it down.

twidge lsrecent -lsu > ~/twidge_data/data

Twidge always works by using 'twidge' followed by a command. In this case, lsrecent, which lists (ls) recent tweets. There are arguments one can use to make this list come out differently, but the -l argument results in a one-tweet-per-line tab delimited list of tweets, and is essential for thi script to work. The -s and -u paramaters make Twidge keep track of where it last looked and gives you stuff only since then. Not shown but available is the -all option.

The -all option (not shown) is ... optional. If you have a couple of hundred followees and want to see the most recent among, say, a couple of dozen, you probably don't need it. However, I'm scanning for about twenty twits among nearly three thousand, and I may check only every few days. Twidge tends to be conservative, only fetching the most recent several dozen tweets. I believe the -all option may work better for the scenario I just described. In any event, you can play with it because it is your command line. You can even create different versions of the script to run under different circumstances.

Here, I dump the output into a file in a special subdirectory where I keep the twidge-related data. The data file that holds these data is called, enigmatically, "data." After the guy on star Trek.

Next line:

awk 'NR==FNR {u[$1];next} ($2 in u)' ~/twidge_data/a-list ~/twidge_data/data > ~/twidge_data/a-tweets

This line takes two files, a-list and data, and using the programing language awk filters the data list based on matches between the second field (which happens to be user name) and the list of user names in the file a-list. The output, the filtered subset of tweets that are only by "a-list" twits (if that is what they are called) is then dumped into the file "a-tweets" with any old content in that file blotto'd out of existence.

The next line is:

cut -f2,4 ~/twidge_data/a-tweets |

sed 's/^\(.*\)\t\(.*\)/<DT>\1<\/DT><DD>\2<<\/DD><\/br><\/br>/g'

> ~/twidge_data/a-tweet.html

This is two (well, really, more, but I'm simplifying slightly) commands connected with a pipe. The first command uses cut to isolate the second and fourth fields from the data, which happen to be the user name and the tweet itself. Left off, then, is a blank field, an ID number, and the date/time stamp. One could argue for leaving on the date/time stamp, but I chose not to.

This stream of data, consisting of one username/tweet pair per line, is then sent to sed, which inserts code to make an HTML definition list. This formats the data as I like it. It also inserts two HTML line breaks after each tweet.

This sed command could be used in a computer science textbook to illustrate all the strangeness of sed. I love sed.

This stream of HTML formatted data is then dumped, unceremoniously, into a file where, if opened with a web browser, it will be formatted as I want it. Since the filename is created with an 'html' extension, opening the file directly will likely open it in an appropriate app, depending on how your desktop is configured.

The last line simply opens up a Firefox instance or tab with the file. If there were no tweets form the a-list twits, then there is nothing in the file and you get blankness. Otherwise, you get something that looks like the picture posted above.

There is a lot that can be done to enhance this. URL's in the tweets could be identified (using sed) and packaged as links, for instance. A bit of code allowing responses or retweets can be added. Eventually, with enough mucking around, one can have a full featured application (like those that may or may not be available) but made entirely from scratch.

Suggestions for mods welcome!

More like this

I know all my fellow bloggers are jealous about my Linux calendar posts (like this one) and normally I don't reveal my secrets. But this is so cool I have to share it. The Linux calendar command (in the terminal window) puts out, by default, a listing of events, etc. from today and tomorrow.…
For today's Linux Hint: How to pick which browser will open when you pick a link while using apine in Ubuntu. Sometimes there is a URL in an email that you want to visit. In a GUI email brower, you click on it with the mouse. In apine you navigate to the link with the usual navigation keys (but…
Responding to the innate human desire to have the faster browser possible, I am almost happy with Firefox 3.  But not quite.  Thus, the experiment: compile from source.  This is accomplished as follows: 0. Kubuntu does not come with the packages necessary to build from source, by default.  You…
Computer-based calendars are very useful, and the Google Calendar is probably one of the more widely used personal calendars other than scheduling programs such as MS Outlook and Groupwise (both of which are broken). But, webby gooey applications can be rather bothersome because they tend to take…

Or, following up on Alcari's comment, you could use TwitHive, which is a web-based app similar to Tweetdeck.

http://www.twithive.com

And my husband could blog using any of a half-dozen already available blogging applications, but he wrote his own. He learned something from it, and he has the option to customize his blog to do exactly what he wants instead of asking for recommendations on the available software from people who may not want to do the same things.

This is delightfully geeky.

Alcari, Harlan:

Gather 'round while I explain a little somethin-somethin about the Unix mindset.

Unix at its most fundamental is about toolsmithing. Take a look at a program like K3B for example -- until I switched to Ubuntu a few weeks ago, K3B was my CD-burning program of choice. (In fact I made sure to install it on Ubuntu, but I haven't needed it yet, since GNOME Brasero seems to do most of what I need.) K3B uses a GUI front end based more or less on Nero, but the back end is a collection of command line programs, mostly cdrtools and growisofs, that are pretty standard across Unix implementations these days. That means that if I don't like what K3B does, I can put together my own GUI but I don't have to worry about interfacing with the drive, dealing with finalization, constructing ISO images, all that garbage -- the separation of front and back ends means that I have a solid, already-debugged backend that takes care of 90% of my problems and all I have to worry about is putting together a command line. Hell, the original incarnation of vi was similar to this, from what I understand -- just a simple cursor-addressible shell over ex.

Interestingly, and I have to get my digs in here as a Mac fan, this isn't too far off from the model Apple was trying to promote for Copland applications -- the GUI was to run in cooperative space, while the backend that actually manipulated the data ran in preemptive space, and the two communicated by AppleEvents. I'm not sure to what extent this still obtains under OS X/Cocoa, but as a side effect it made AppleScripting uncommonly easy. (But I always did think AppleScript was the Mac's ace in the hole over Windows.)

Gather round while I explain a little something about this approach to life. Get a hobby--this is a colossal waste of time. Not having to do things like this is the point of software.

Okay, so I'm not one to pass up a snarky joke involving underground cartoon characters.

What I was getting at is that the programs mentioned above don't separate front-end from back-end, which is a fine design strategy until you actually want to mix data. By separating the two, you wind up creating a way to use one back-end with multiple front ends. In fact, I've thought of two other examples -- Microsoft Access, which is based on the Jet engine (which can be swapped out for something more robust if you know what you're doing), and gcc, which pipes multiple front-end languages through the same optimizer and code generators. None of that would be possible without separating front from back ends.

I have messed around with the various twitter clients and I find them lacking. It took me five minutes to put this script together. Then in five more minutes I got a replacement part for the awk line (mine didn't work) and in one minute I had a script that worked. So 11 minutes development time. A hobby that produces a benefit like that is not to be snarked at.

I have further plans. Some of you may remember my old Linux Calendar posts. Well, what about a Today's Tweet's post?

My twitter feed is a freakin' text file. Do you have any idea HOW POWERFUL THIS MAKES ME???????

MWAHAHAHAHAHAH!!!

Hey, I might Tweet that...

It all depends on the way bash carries out command substitution in this case.. I might have to make it a back-tick.

Nice. It would be nice to have retweet and reply links. That should not be difficult to code as one more filter (or two) in the flow.

I'm finding that -all without -s works best

Enoch: I think it may depend on the traffic amount and time since last checking.

I'm pretty sure you don't have to follow all those people to get your tweets to work, all you need is for them to follow you. But maybe I'm confused about what your problem is.

Drekab: technically you are correct, but there are other things going in in that regard, so unfollowing them is not an option.

sudo apt-get install twidge

apt-get works nicely with Ubuntu, but not in OSX. There isn't a package installer for twidge either. Now one could download the source and compile it, but meh. ;)

Warren, you can do this, you just need to install some software that will thereafter let you use the equivalent of synaptic. Check out the current issue of Linux Journal for an article on that (I'd give details but I'm not anywhere near my Linux Journal at the moment.)

Funny thing is, now that I'm only seeing the tweets from my real people that I know and love, I'm not seeing much. And I know these people have more to offer. Maybe this 140 characters is all you need philosophy is bogus.

You should check out gale (http://www.gale.org) and revel in the geeky gloriousness of an instant messaging system implemented correctly by very smart people. Sadly, Beta vs VHS applies. Though, it would make a hell of a good back-end for the next twitter to come along.

Greg - re: comment 20. I have been busy with a new linux machine. My first, (not counting the ones I did for the kids.) I have been too busy learning the basics of Ubuntu to tweet much, but I'll try to feed you some tweets to satisfy you, soon.