A Case for Limits on File Names

Ray Ingles pointed out this position paper which I think is worth looking at …

Traditionally, Unix/Linux/POSIX pathnames and filenames can be almost any sequence of bytes. A pathname lets you select a particular file, and may include one or more “/” characters. Each pathname component (separated by “/”) is a filename; filenames cannot contain “/”. Neither filenames nor pathnames can contain the ASCII NUL character (\0), because that is the terminator.

This lack of limitations is flexible, but it also creates a legion of unnecessary problems. In particular, this lack of limitations makes it unnecessarily difficult to write correct programs (enabling many security flaws). It also makes it impossible to consistently and accurately display filenames, causes portability problems, and confuses users.

This article will try to convince you that adding some tiny limitations on legal Unix/Linux/POSIX filenames would be an improvement. Many programs already presume these limitations, the POSIX standard already permits such limitations, and many Unix/Linux filesystems already embed such limitations — so it’d be better to make these (reasonable) assumptions true in the first place.


One thing I’m reminded of is this: I posted a bit of code a while ago (can’t remember what it did) and I got several suggested rewrites from commenters. One subset of the rewrites chastised the code for using too many cycles or too many lines of code, etc. Another subset of rewrites added piles of lines of code in order to deal with the eventuality that someone would include a newline (like a carriage return) in the filename.

The position paper reminds me of something else. Don’t start a filename with a hyphen!!!!!!

Imagine that you don’t know Unix/Linux/POSIX (I presume you really do), and that you’re trying to do some simple things with its command line. For example, let’s try to print out the contents of all files in the current directory, putting the contents into a file in the parent directory:

cat * > ../collection # WRONG

The list doesn’t include “hidden” files (filenames beginning with “.”), but often that’s what you want anyway, so that’s not unreasonable. The problem with this approach is that although this usually works, filenames could begin with “-” (e.g., “-n”). So if there’s a file named “-n”, and you’re using GNU cat, all of a sudden your output will be numbered! Oops; that means on every command we have to disable option processing.

… and so on.

Comments

  1. #1 MadScientist
    December 24, 2009

    One old joke used to be creating a file named ‘-rf *’ in the / directory. It was such a bad joke that some tools have been modified to take that into account.

    I see no need to impose some of those artificial limits; that’s a waste of time crippling filesystems. Even the list of acceptable characters is not so straightforward – remember that many filesystems can also use UTF-8 filenames. You’ll have an awful lot of languages, symbols, and rules to put into your list and this can become a performance problem on servers. So, in the grand old tradition of UNIX – I say let the end user do whatever they damned well please – and if they hang themselves, that’s their problem. However, operating systems can make such character filters non-mandatory and GUI applications can make use of those filters without any apparent problems. That would allow batch software to get their job done producing whatever bizarre filenames they wish while users are (mostly) restricted to names which don’t annoy other people.

  2. #2 Alex Besogonov
    December 25, 2009

    MasScientists:

    Initially, I thought that way too. However, limiting names to UTF-8 is a very good idea.

    Right now it’s possible to create file name which you WON’T BE ABLE to type on your keyboard. Or in some extreme cases even see.

    And this is compounded by stupid Unix shell scripts. The guy who invented them should hit with a bat. Repeatedly.

Current ye@r *