Monday, October 26, 2015

FilePaths are subtle, symlinks are hard

Summary: When thinking about the filepath .., remember symlinks, or you will trip yourself up.

As the maintainer of the Haskell filepath package, one common path-related mistake I see is the assumption that filepaths have the invariant:

/bob/home/../cookies == /bob/cookies

I can see the conceptual appeal - go down one directory, go up one directory, end up where you started. Unfortunately, it's not true. Consider the case where home is a symlink to tony/home. Now, applying the symlink leaves us with the case:

/tony/home/../cookies == /bob/cookies

And, assuming /tony/home is itself not a symlink, that reduces to:

/tony/cookies == /bob/cookies

This is clearly incorrect (assuming no symlinks), so the original invariant was also incorrect, and cannot be relied upon in general. The subtle bit is that descending into a directory might move somewhere else, so it's not an operation that can be undone with ... Each step of the path is interpreted based on where it ends up, not based on the path it took to the current point.

While this property isn't true in general, there are many special cases where it is reasonable to assume. For example, the shake package contains a normaliseEx function that normalises under this assumption, but nothing in the standard filepath package assumes it.

The full example
/
   [DIR]  bob
   [DIR]  tony
/bob
   [LINK] home -> /tony/home
   [FILE] cookies 
/tony
   [DIR]  home
/tony/home
   [FILE] cookies

6 comments:

Guanpeng Xu said...

Excuse me, but I have read this many times but still fail to understand the subtle case. Could you please show the directory structures as in the format of `ls -lR`? Thanks.

Sincerely yours,
Guanpeng Xu

Unknown said...

For whatever it's worth, this is one of the major painpoints of UNIX: ".." is actually a dentry that points back up the tree (meaning that not only are filesystems not the DAGs we think they are, almost all inter-directory-node links are loopy!). People get confused easily because most *shells* deliberately provide the "a/b/../c == a/c" view, by keeping track of the path of chdirs rather than using the ".." dentry.

This, like so many things, was fixed in Plan 9: See Rob Pike's "Lexical File Names in Plan 9 or Getting Dot-Dot Right".

Neil Mitchell said...

Guanpeng: I've updated the post with the full example. Does that make it clearer?

Unknown: That's a very interesting paper, thanks for the link and the information. I still find it weird that you can actually make .. point somewhere else entirely, and that by default getDirectoryContents (in Haskell at least) returns .. which is rarely what you want.

Guanpeng Xu said...

Yes but your example seemed to mean

/bob/home/../cookies == /tony/cookies

instead of

/tony/home/../cookies == /bob/cookies

Sincerely yours,
Guanpeng Xu

Neil Mitchell said...

Guanpeng: My use of equality was to imply something people thought was correct, and then show by following symlinks etc. that it wasn't correct, so all those equalities are intended to be "what people thought to be true, but is in fact false". I've noted that after the final one, saying they are untrue, which I forgot to do the first time round. Does that make it clear?

Guanpeng Xu said...

I see. I looked at it from another angle, which caused the confusion. I think I should also try this mathematical way of thinking later on. Thank you.

Sincerely yours,
Guanpeng Xu