Now on ScienceBlogs: Q: How do you sex a Smilodon? (A: Very carefully)

Seed Media Group

Greg Laden's Blog

Evolution, Life Sciences, Science Education, Human Evolution, and Stuff

Recent Comments

Profile


Welcome to Greg Laden's Blog.




Nature Blog Network



Search

Blogroll

Join the best atheist themed blogroll!
GLB_LOGO_180w.png
GLB_LOGO_180w.png
openlab08-submit.150.png



open_access_day_blog_award.jpg

Archives

Recent Posts

« Linux One Liners | Main | Marilyn Monroe Found Dead .... »

In the beginning there was cat ...

Posted on: August 4, 2008 10:51 PM, by Greg Laden

... and I'm not talking about Ceiling Cat. I'm talking about the Linux command cat.

Apropos the comment that one should not use cat to produce a stream of text for a command that takes a filename as a argument ... I say "Balderdash!"

If your plan is to process the text from a file with a single command and that's that .. no modifications will be needed ... then by all means, you will save time and computing resources (though immeasurably little of either) by using the filename as an argument to said command.

But if you are not quite sure how you are going to get to your eventual final results, you might find that staring with cat will give you a paws up when you need to start fiddling. The cat command will even take arguments that can help you.

In fact, I think it is the widespread conception that cat is 'merely a filter' that does nothing but pass the contents of the file it is fed contributes to the belief that cat is always useless. In fact, cat is a very powerful command.

Cat counts as a filter. A file goes in and comes out. When cat is issued with nothing other than a filename as an argument, it does nothing but stream out the contents of the file. This is convenient if you want to construct a complex command that starts with the contents of a file running into standard input. For instance:

cat mouse.txt

will simply 'print' to the terminal the contents of moue.txt. But that's OK. You can do this and verify that mouse.txt exists and contains roughly what you thought it contained. Then you can add something to this such as

cat mouse.txt | grep mice

which gets me:

Three blind mice.

Three blind mice.

As three blind mice.

I could have gone

grep mice mouse.text

and gotten the same thing, but that would interfere with the poetry of

cat mouse.txt | grep mice | wc

which gets me: 2 10 66

which is, of course, either Daryl Johnston's or Rodney Anoa'i's birthday.

Here's another good one. A common question given to programmers looking for work, as part of their application, is this:

"Write a perl one liner that adds numbers to the lines in a file."

Answer:

Who needs perl? ... cat -n mouse.txt

which gets me this:


1 Three Blinde Mice,
2 three Blinde Mice,
3 Dame Iulian,
4 Dame Iulian,
5 The Miller and his merry olde Wife,
6 shee scrapte her tripe licke thou the knife.
7
8 Three blind mice. Three blind mice.
9 See how they run. See how they run.
10 They all ran after the farmer's wife
11 She cut off their tails with a carving knife.
12 Did you ever see such a thing in your life
13 As three blind mice.

or, cat -b mouse.txt

which is similar but only numbers non-blank lines.

or the -s option, one of the coolest, which does not allow more than one blank line at a time through the filter!

or -T which shows tabs, otherwise invisible.

Man cat. Try it, you'll like it.


Share this: Stumbleupon Reddit Email + More

TrackBacks

TrackBack URL for this entry: http://scienceblogs.com/mt/pings/78143

Comments

1

cat foo | cmd

has the very distinct and real problem that if file foo cannot be opened, then the "exit status" of the pipe ($?) will be 0 (zero)--success. This is probably not what you want, especially in a Makefile, or in constructs such as
if cat foo | cmd; then ... fi

In contrast, when a command has a filename argument, it usually fails (exits non-0) if it cannot open a required/named input file. Even if the command just reads stdin (the standard input), crunch (<file) behaves better than using cat.

cat has its place and its purpose. But the Useless Use of Cat™ award exists for multiple reasons.

Also look up Pike's paper, cat -v Considered Harmful.

Posted by: blf | August 5, 2008 1:28 AM

2

When I read the previous post that started this, I almost joined the "don't cat" responders. But on consideration, I think this is more like off-topic grammar correction.

For those on the fly, build up a command line by experimentation situations, the performance difference is insignificant and probably allows quicker command editing. So Greg's original examples are fine.

Lifelong habit will probably limit how much *I* use cat, however, lol.

If I'm going to the trouble to write a script, I take more care in crafting the commands and handling edge / error conditions.

So, it may be beneficial for casual readers to be altered to potential problems, but not in a dogmatic way. Google the recursive "'considered harmful' considered harmful".

In the spirit of TMTOWTDI, how about a contest to see how many ways to number lines of a file? I'll start with "nl foo.txt". (j/k)

Posted by: Larry | August 5, 2008 7:26 AM

3

Yes, this is all about constructing commands. Between each iteration of developing the command, you are using the history buttons to go back to older versions and then modifying.

cat mouse.txt issued by itself tells you that mouse.txt is in the working directory, reminds you what the contents of the file looks like, etc.

However, one might seriously leave it this way when making a script because later modifications may involve copying the script to command line and playing around further. It depends on what you are trying to do.

Brian, the name of the paper is "Program design in the UNIX environment. "Cat -v...." is a related talk Kernighan (did not invent C) and Rob Pike gave on it (and I think the name of an organization). This is not really about the '-v' switch on cat. These are radical writings and are part of the kernel wars, not good programming practices. Pike, in particular, is to *nix, say, what PZ Myers is to the Catholic Church.

How to deal with the exit status: Again, in a command line building scenario, we do not see the exit status, so go figure. If I have three commands chained as shown above, the exit status of the second command is hidden. If one really wants exit stati out of a sequence of commands, I suppose one could tee the output to standard error to some place (not sure if that would work).

I'm a little uncomfortable with non zero exit status being used to make a decision about doing something other than exiting with an error message anyway.

Posted by: Greg Laden | August 5, 2008 8:23 AM

4

There doesn't seem to be a "dog" command in Linux. This is obviously bigotry.

Posted by: Virgil Samms | August 5, 2008 9:04 AM

5
I'm a little uncomfortable with non zero exit status being used to make a decision about doing something other than exiting with an error message anyway.
Consider a configure script, used to configure makefiles before building. It needs to test for the existence and behavior of many commands and libraries. Instead of exiting with an error message when it sees a non-zero exit status, it goes on to test alternatives and configure the make accordingly.

Posted by: llewelly | August 5, 2008 9:54 AM

6

If you're really concerned about cat trying to cat a nonexistent file, just use this instead:

[ -f mouse ]&& cat mouse | cmd || echo "no mouse found"
-or-
if [ -f mouse]; then
cat mouse | cmd
else
echo "no mouse found"
fi
Error management is an important part of any script.

Posted by: Ben Zvan | August 5, 2008 11:50 AM

7

llewelly: See, Ben's solution specifically checks for a particular condition. I know that a file operation is likely to fail because of a missing file, but a) it may fail for other reasons and b) what seems like an obvious OK alternative early in development may be come a window into disaster later in development.

Virgil: Right, but did you ever notice what "dog" spelled backwards is? Not that this is relevant, but did you ever notice?

Posted by: Greg Laden | August 5, 2008 12:32 PM

8

So, Greg, you're suggesting that use of "dog" should be reserved for daemons?

Posted by: Stephanie Z | August 5, 2008 1:17 PM

9

It depends on the Linux distro.

Posted by: Greg Laden | August 5, 2008 1:47 PM

Post a Comment

(Email is required for authentication purposes only. On some blogs, comments are moderated for spam, so your comment may not appear immediately.)





ScienceBlogs

Search ScienceBlogs:

Go to:

Advertisement
Follow ScienceBlogs on Twitter
Visit the Collective Imagination blog
Advertisement

© 2006-2009 Seed Media Group LLC. ScienceBlogs is a registered trademark of Seed Media Group. All rights reserved.

Sites by Seed Media Group: Seed Media Group | ScienceBlogs | SEEDMAGAZINE.COM