Now on ScienceBlogs: The Festival Recognizes Our First "Featured Fan"!

ScienceBlogs Book Club: Inside the Outbreaks

Greg Laden's Blog

Evolution, Life Sciences, Science Education, Human Evolution, and Stuff

Recent Comments

Search

Profile


Click on "About" for the big picture, and "Archives" for the details.


Lion_mane170.jpg Lean more about lions
Congo_sidebar.jpg An archaeological expedition to the Congo
profile_laden_top.jpg

If you must read only a few things today, please select from the following:

"Excuse me, there's some food in my bugs!", an exploration of human eating insects instead of the other way 'round. Also check out this related PODCAST.

You've heard the news, now find out what the scientists haven't told you because it is kinda hard to 'splain: What Happened to Our Beloved Archaeopteryx?

Why do mainstream newspapers still publish anti-evolution crank mail? Because you have not told them to refrain. Click the link and join the movement.

linux_penguin.jpg
Linux

Nature Blog Network

Climate Defense Fund


The contents of Greg Laden's Blog are copyrighted by Greg Laden.

The Skeptical Search Engine

This search engine will only give you results from carefully selected skeptical and scientific sites.


Recent Posts

Blogroll

If you don't see yourself on my blogroll, just drop me a line and let me know. I'll add you.*
*Assuming that I'm on your blogroll, of course!

Archives

« Linux Links for your Edumication and Enjoyment | Main | Mysteries of the Congo »

A Case for Limits on File Names

Category: Linux
Posted on: December 24, 2009 2:18 PM, by Greg Laden

Ray Ingles pointed out this position paper which I think is worth looking at ...

Traditionally, Unix/Linux/POSIX pathnames and filenames can be almost any sequence of bytes. A pathname lets you select a particular file, and may include one or more "/" characters. Each pathname component (separated by "/") is a filename; filenames cannot contain "/". Neither filenames nor pathnames can contain the ASCII NUL character (\0), because that is the terminator.

This lack of limitations is flexible, but it also creates a legion of unnecessary problems. In particular, this lack of limitations makes it unnecessarily difficult to write correct programs (enabling many security flaws). It also makes it impossible to consistently and accurately display filenames, causes portability problems, and confuses users.

This article will try to convince you that adding some tiny limitations on legal Unix/Linux/POSIX filenames would be an improvement. Many programs already presume these limitations, the POSIX standard already permits such limitations, and many Unix/Linux filesystems already embed such limitations -- so it'd be better to make these (reasonable) assumptions true in the first place.

One thing I'm reminded of is this: I posted a bit of code a while ago (can't remember what it did) and I got several suggested rewrites from commenters. One subset of the rewrites chastised the code for using too many cycles or too many lines of code, etc. Another subset of rewrites added piles of lines of code in order to deal with the eventuality that someone would include a newline (like a carriage return) in the filename.

The position paper reminds me of something else. Don't start a filename with a hyphen!!!!!!


Imagine that you don't know Unix/Linux/POSIX (I presume you really do), and that you're trying to do some simple things with its command line. For example, let's try to print out the contents of all files in the current directory, putting the contents into a file in the parent directory:

cat * > ../collection # WRONG

The list doesn't include "hidden" files (filenames beginning with "."), but often that's what you want anyway, so that's not unreasonable. The problem with this approach is that although this usually works, filenames could begin with "-" (e.g., "-n"). So if there's a file named "-n", and you're using GNU cat, all of a sudden your output will be numbered! Oops; that means on every command we have to disable option processing.


... and so on.

Share on Facebook
Share on StumbleUpon
Share on Facebook
Find more posts in: Technology

TrackBacks

TrackBack URL for this entry: http://scienceblogs.com/mt/pings/127780

Comments

1

One old joke used to be creating a file named '-rf *' in the / directory. It was such a bad joke that some tools have been modified to take that into account.

I see no need to impose some of those artificial limits; that's a waste of time crippling filesystems. Even the list of acceptable characters is not so straightforward - remember that many filesystems can also use UTF-8 filenames. You'll have an awful lot of languages, symbols, and rules to put into your list and this can become a performance problem on servers. So, in the grand old tradition of UNIX - I say let the end user do whatever they damned well please - and if they hang themselves, that's their problem. However, operating systems can make such character filters non-mandatory and GUI applications can make use of those filters without any apparent problems. That would allow batch software to get their job done producing whatever bizarre filenames they wish while users are (mostly) restricted to names which don't annoy other people.

Posted by: MadScientist | December 24, 2009 5:42 PM

2

MasScientists:

Initially, I thought that way too. However, limiting names to UTF-8 is a very good idea.

Right now it's possible to create file name which you WON'T BE ABLE to type on your keyboard. Or in some extreme cases even see.

And this is compounded by stupid Unix shell scripts. The guy who invented them should hit with a bat. Repeatedly.

Posted by: Alex Besogonov | December 25, 2009 8:44 AM

Post a Comment

(Email is required for authentication purposes only. On some blogs, comments are moderated for spam, so your comment may not appear immediately.)





ScienceBlogs

Search ScienceBlogs:

Go to:

Advertisement
Follow ScienceBlogs on Twitter

© 2006-2011 ScienceBlogs LLC. ScienceBlogs is a registered trademark of ScienceBlogs LLC. All rights reserved.