Know Your Files (a Linux one-liner)

Don’t type this one in until you are sure you want to.

find . -print|xargs file|awk ‘{$1=””;x[$0]++;}END{for(y in x)printf(“%d\t%s\n”,x[y],y);}’|sort -nr

(yes, that is all one line)

Let’s break it down.

find . -print prints out (to standard output) every freakin file on your drive (or some other large unit of storage space).

the xargs is a command that builds commands. In this case, it takes input from find and converts that into a listing of files that includes information on what kind of file it is, using the file command.

This is then piped to awk, which runs through the file listing that is streaming out of xargs and tallies the different file types. This all gets piped to sort which gives us a reverse numerical sort.

So, in the end, you get a list of all the different kinds of files you’ve got on the directory tree. If you run this from home, it will be all the files on your home directory.

This is from here, where you will find a script version of the same thing. You will also find at this site an explanation for why you would wan to do this .

Oh, wait, no, you won’t find any such explanation. Still working on this.

You may get an error message or two, and this will take a while to run. But it will work. Probably.

Comments

  1. #1 Anthony
    August 10, 2008

    Actually, the above code won’t work as desired, since the output of ‘file’ includes the actual file name, and thus all lines are different. Suggested code:
    find . -print | xargs file | cut -d':’ -f2- | sort | uniq -c | sort -nr

    Not that this explains just why you’d want to do this.

  2. #2 Richard
    August 10, 2008

    Anthony,

    Unless Greg changed it since your comment, I think his code was going to work as well as yours, because in his awk line, he set the first field to “”, which would be the filename.

    It and yours both still fail, at least for me, because I have filenames with spaces and colons in them, and each part separated by a space is passed separately to xargs. So, I think it would be better to start with:
    find . -print0 | xargs -0 file

    Greg’s will still fail after the xargs because his awk statement will only omit the part of a `file` output line before the first space, so “friends and family” becomes “and family: directory”.

    Likewise, yours has the disadvantage of only cutting everything before the :. Some filenames use : in them.

    The version of file I am using (with both Ubuntu 8.04 and Fedora 9), have their file descriptions lined up with left justification, rather than simply being one or two spaces away from the colon. The distance from the left ends up being determined by the widest filename.

    However, we don’t need cut or any fun awkness, as we can use the -b option to file to omit the filename.

    So, I recommend either of these for robustness:

    $ find . -print0 | xargs -0 file -b | awk ‘ { table[$0]++; } END { for (i in table) { printf(“%7d %s\n”, table[i], i); } } ‘ | sort -n

    $ find . -print0 | xargs -0 file -b | sort | uniq -c | sort -n

Current ye@r *