Skip to content
Advertisement

Get directory path by fd

I’ve run into the need to be able refer to a directory by path given its file descriptor in Linux. The path doesn’t have to be canonical, it just has to be functional so that I can pass it to other functions. So, taking the same parameters as passed to a function like fstatat(), I need to be able to call a function like getxattr() which doesn’t have a f-XYZ-at() variant.

So far I’ve come up with these solutions; though none are particularly elegant.

The simplest solution is to avoid the problem by calling openat() and then using a function like fgetxattr(). This works, but not in every situation. So another method is needed to fill the gaps.

The next solution involves looking up the information in proc:

JavaScript

This, of course, totally breaks on systems without proc, including some chroot environments.

The last option, a more portable but potentially-race-condition-prone solution, looks like this:

JavaScript

The obvious problem here is that in a multithreaded app, changing the working directory around could have side effects.

However, the fact that it works is compelling: if I can get the path of a directory by calling fchdir() followed by getcwd(), why shouldn’t I be able to just get the information directly: fgetcwd() or something. Clearly the kernel is tracking the necessary information.

So how do I get to it?


Answer

The way Linux implements getcwd in the kernel is this: it starts at the directory entry in question and prepends the name of the parent of that directory to the path string, and repeats that process until it reaches the root. This same mechanism can be theoretically implemented in user-space.

Thanks to Jonathan Leffler for pointing this algorithm out. Here is a link to the kernel implementation of this function: https://github.com/torvalds/linux/blob/v3.4/fs/dcache.c#L2577

Advertisement

Answer

The kernel thinks of directories differently from the way you do – it thinks in terms of inode numbers. It keeps a record of the inode number (and device number) for the directory, and that is all it needs as the current directory. The fact that you sometimes specify a name to it means it goes and tracks down the inode number corresponding to that name, but it preserves only the inode number because that’s all it needs.

So, you will have to code a suitable function. You can open a directory directly with open() precisely to get a file descriptor that can be used by fchdir(); you can’t do anything else with it on many modern systems. You can also fail to open the current directory; you should be testing that result. The circumstances where this happens are rare, but not non-existent. (A SUID program might chdir() to a directory that the SUID privileges permit, but then drop the SUID privileges leaving the process unable to read the directory; the getcwd() call will fail in such circumstances too – so you must error check that, too!) Also, if a directory is removed while your (possibly long-running) process has it open, then a subsequent getcwd() will fail.

Always check results from system calls; there are usually circumstances where they can fail, even though it is dreadfully inconvenient of them to do so. There are exceptions – getpid() is the canonical example – but they are few and far between. (OK: not all that far between – getppid() is another example, and it is pretty darn close to getpid() in the manual; and getuid() and relatives are also not far off in the manual.)

Multi-threaded applications are a problem; using chdir() is not a good idea in those. You might have to fork() and have the child evaluate the directory name, and then somehow communicate that back to the parent.


bignose asks:

This is interesting, but seems to go against the querent’s reported experience: that getcwd knows how to get the path from the fd. That indicates that the system knows how to go from fd to path in at least some situations; can you edit your answer to address this?

For this, it helps to understand how – or at least one mechanism by which – the getcwd() function can be written. Ignoring the issue of ‘no permission’, the basic mechanism by which it works is:

  • Use stat on the root directory ‘/’ (so you know when to stop going upwards).
  • Use stat on the current directory ‘.’ (so you know where you are); this gives you a current inode.
  • Until you reach the root directory:
  • Scan the parent directory ‘..’ until you find the entry with the same inode as the current inode; this gives you the next component name of the directory path.
  • And then change the current inode to the inode of ‘.’ in the parent directory.
  • When you reach root, you can build the path.

Here is an implementation of that algorithm. It is old code (originally 1986; the last non-cosmetic changes were in 1998) and doesn’t make use of fchdir() as it should. It also works horribly if you have NFS automounted file systems to traverse – which is why I don’t use it any more. However, this is roughly equivalent to the basic scheme used by getcwd(). (Ooh; I see a 18 character string (“../123456789.abcd”) – well, back when it was written, the machines I worked on only had the very old 14-character only filenames – not the modern flex names. Like I said, it is old code! I haven’t seen one of those file systems in what, 15 years or so – maybe longer. There is also some code to mess with longer names. Be cautious using this.)


JavaScript
Advertisement