Now on ScienceBlogs: Charles Darwin, Geologist

ScienceBlogs Book Club: Inside the Outbreaks

Greg Laden's Blog

Evolution, Life Sciences, Science Education, Human Evolution, and Stuff

Darwing_Face.jpg Learn more about Charles Darwin and his work.

Hornbill170.jpg Looking for stuff about birds?

Lion_mane170.jpg Lean more about lions

Congo_sidebar.jpg An archaeological expedition to the Congo


The Skeptical Search Engine


Nature Blog Network
Climate Defense Fund


The contents of Greg Laden's Blog are copyrighted by Greg Laden.

Recent Comments

Search

Profile


Click on "About" for the big picture, and "Archives" for the details.


Recent Posts

Blogroll

If you don't see yourself on my blogroll, just drop me a line and let me know. I'll add you.*
*Assuming that I'm on your blogroll, of course!

Archives

« Linux One Liners | Main | Marilyn Monroe Found Dead .... »

In the beginning there was cat ...

Posted on: August 4, 2008 10:51 PM, by Greg Laden

... and I'm not talking about Ceiling Cat. I'm talking about the Linux command cat.

Apropos the comment that one should not use cat to produce a stream of text for a command that takes a filename as a argument ... I say "Balderdash!"

If your plan is to process the text from a file with a single command and that's that .. no modifications will be needed ... then by all means, you will save time and computing resources (though immeasurably little of either) by using the filename as an argument to said command.

But if you are not quite sure how you are going to get to your eventual final results, you might find that staring with cat will give you a paws up when you need to start fiddling. The cat command will even take arguments that can help you.

In fact, I think it is the widespread conception that cat is 'merely a filter' that does nothing but pass the contents of the file it is fed contributes to the belief that cat is always useless. In fact, cat is a very powerful command.

Cat counts as a filter. A file goes in and comes out. When cat is issued with nothing other than a filename as an argument, it does nothing but stream out the contents of the file. This is convenient if you want to construct a complex command that starts with the contents of a file running into standard input. For instance:

cat mouse.txt

will simply 'print' to the terminal the contents of moue.txt. But that's OK. You can do this and verify that mouse.txt exists and contains roughly what you thought it contained. Then you can add something to this such as

cat mouse.txt | grep mice

which gets me:

Three blind mice.

Three blind mice.

As three blind mice.

I could have gone

grep mice mouse.text

and gotten the same thing, but that would interfere with the poetry of

cat mouse.txt | grep mice | wc

which gets me: 2 10 66

which is, of course, either Daryl Johnston's or Rodney Anoa'i's birthday.

Here's another good one. A common question given to programmers looking for work, as part of their application, is this:

"Write a perl one liner that adds numbers to the lines in a file."

Answer:

Who needs perl? ... cat -n mouse.txt

which gets me this:


1 Three Blinde Mice,
2 three Blinde Mice,
3 Dame Iulian,
4 Dame Iulian,
5 The Miller and his merry olde Wife,
6 shee scrapte her tripe licke thou the knife.
7
8 Three blind mice. Three blind mice.
9 See how they run. See how they run.
10 They all ran after the farmer's wife
11 She cut off their tails with a carving knife.
12 Did you ever see such a thing in your life
13 As three blind mice.

or, cat -b mouse.txt

which is similar but only numbers non-blank lines.

or the -s option, one of the coolest, which does not allow more than one blank line at a time through the filter!

or -T which shows tabs, otherwise invisible.

Man cat. Try it, you'll like it.


Share on Facebook
Share on StumbleUpon
Share on Facebook
Find more posts in: Technology

TrackBacks

TrackBack URL for this entry: http://scienceblogs.com/mt/pings/78143

Comments

1

cat foo | cmd

has the very distinct and real problem that if file foo cannot be opened, then the "exit status" of the pipe ($?) will be 0 (zero)--success. This is probably not what you want, especially in a Makefile, or in constructs such as
if cat foo | cmd; then ... fi

In contrast, when a command has a filename argument, it usually fails (exits non-0) if it cannot open a required/named input file. Even if the command just reads stdin (the standard input), crunch (<file) behaves better than using cat.

cat has its place and its purpose. But the Useless Use of Cat™ award exists for multiple reasons.

Also look up Pike's paper, cat -v Considered Harmful.

Posted by: blf | August 5, 2008 1:28 AM

2

When I read the previous post that started this, I almost joined the "don't cat" responders. But on consideration, I think this is more like off-topic grammar correction.

For those on the fly, build up a command line by experimentation situations, the performance difference is insignificant and probably allows quicker command editing. So Greg's original examples are fine.

Lifelong habit will probably limit how much *I* use cat, however, lol.

If I'm going to the trouble to write a script, I take more care in crafting the commands and handling edge / error conditions.

So, it may be beneficial for casual readers to be altered to potential problems, but not in a dogmatic way. Google the recursive "'considered harmful' considered harmful".

In the spirit of TMTOWTDI, how about a contest to see how many ways to number lines of a file? I'll start with "nl foo.txt". (j/k)

Posted by: Larry | August 5, 2008 7:26 AM

3

Yes, this is all about constructing commands. Between each iteration of developing the command, you are using the history buttons to go back to older versions and then modifying.

cat mouse.txt issued by itself tells you that mouse.txt is in the working directory, reminds you what the contents of the file looks like, etc.

However, one might seriously leave it this way when making a script because later modifications may involve copying the script to command line and playing around further. It depends on what you are trying to do.

Brian, the name of the paper is "Program design in the UNIX environment. "Cat -v...." is a related talk Kernighan (did not invent C) and Rob Pike gave on it (and I think the name of an organization). This is not really about the '-v' switch on cat. These are radical writings and are part of the kernel wars, not good programming practices. Pike, in particular, is to *nix, say, what PZ Myers is to the Catholic Church.

How to deal with the exit status: Again, in a command line building scenario, we do not see the exit status, so go figure. If I have three commands chained as shown above, the exit status of the second command is hidden. If one really wants exit stati out of a sequence of commands, I suppose one could tee the output to standard error to some place (not sure if that would work).

I'm a little uncomfortable with non zero exit status being used to make a decision about doing something other than exiting with an error message anyway.

Posted by: Greg Laden | August 5, 2008 8:23 AM

4

There doesn't seem to be a "dog" command in Linux. This is obviously bigotry.

Posted by: Virgil Samms | August 5, 2008 9:04 AM

5
I'm a little uncomfortable with non zero exit status being used to make a decision about doing something other than exiting with an error message anyway.
Consider a configure script, used to configure makefiles before building. It needs to test for the existence and behavior of many commands and libraries. Instead of exiting with an error message when it sees a non-zero exit status, it goes on to test alternatives and configure the make accordingly.

Posted by: llewelly | August 5, 2008 9:54 AM

6

If you're really concerned about cat trying to cat a nonexistent file, just use this instead:

[ -f mouse ]&& cat mouse | cmd || echo "no mouse found"
-or-
if [ -f mouse]; then
cat mouse | cmd
else
echo "no mouse found"
fi
Error management is an important part of any script.

Posted by: Ben Zvan | August 5, 2008 11:50 AM

7

llewelly: See, Ben's solution specifically checks for a particular condition. I know that a file operation is likely to fail because of a missing file, but a) it may fail for other reasons and b) what seems like an obvious OK alternative early in development may be come a window into disaster later in development.

Virgil: Right, but did you ever notice what "dog" spelled backwards is? Not that this is relevant, but did you ever notice?

Posted by: Greg Laden | August 5, 2008 12:32 PM

8

So, Greg, you're suggesting that use of "dog" should be reserved for daemons?

Posted by: Stephanie Z | August 5, 2008 1:17 PM

9

It depends on the Linux distro.

Posted by: Greg Laden | August 5, 2008 1:47 PM

Post a Comment

(Email is required for authentication purposes only. On some blogs, comments are moderated for spam, so your comment may not appear immediately.)





ScienceBlogs

Search ScienceBlogs:

Go to:

Advertisement
Follow ScienceBlogs on Twitter

© 2006-2011 ScienceBlogs LLC. ScienceBlogs is a registered trademark of ScienceBlogs LLC. All rights reserved.