Skip to content
Advertisement

the best way to get “find” style output from “ls -fR”

My goal is to find the fastest way to list all available files in a directory (call it the master directory). The master directory contains about 5 million files, organized using subdirectories but it’s unclear how subdirectories are arranged. After some research I realized that the fastest way to do so is using ls -fR (-f disables sorting)

The default output from ls -fR is something like this:

$ ls -fR dir1
dir1:
.  subdir1  ..

dir1/subdir1:
.  file1  ..

My desired output is the one produced by find (find takes twice as long though):

$ find dir1/ -type f
dir1/subdir1/file1

Although I can potentially parse the ls -fR result, I was wondering if there is a simple way to make ls -fR output in “find” style. I hope there is a very easy toggle and I’m just being blind to it

Advertisement

Answer

find takes twice as long

Interesting. Are you really sure though?

  • ls -fR ignores hidden files and directories. Maybe ls just skips some of the work. Try ls -fRA too.
  • If you run find; ls -fR the latter will have a huge advantage due to caching. Try swapping the order or clear the cache (sync; echo 3 | sudo tee /proc/sys/vm/drop_caches) before each command.

I hope there is a very easy toggle and I’m just being blind to it

Not that I would know. Posix ls certainly has no such thing. As far as I can tell from man ls, even GNU ls 8.32 has no such option.

You could adapt the output of ls to match that of find using

ls -fRpA | awk '/:$/ {sub(/:$/,"/"); p=$0; next} length() && !/// {print p $0}'

Even though that would break on files/directories with linebreaks and files ending with a :. Also, you will slow down the script a bit. The longer the paths, the slower it gets, I’d assume. This could also explain partially why find is slower than ls. The former just prints a lot more text because it has to repeat the name of the top level directories over and over again.

I strongly advise against using above script. It is fragile and unreadable, likely just for the sake of premature optimization: Most certainly you want to do something with the printed list. That something will probably take more time than generating the list. Also, with different implementations running on different systems find may be faster than ls – you never know.

Also, don’t parse the output of ls/find, instead use find -exec to do the actual task. If you really must, find -print0 would be the safe option (can be replaced by find -exec printf %s\0 {} + if not available on your system).

Depending on the task, locate might be a fast alternative to find. If not, try parallelizing find using something like printf %s\0 ./* | xargs -0 -I_ -P0 find _ -type f or a tool like fd that has built-in parallelization.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement