Skip to content
Advertisement

Comm command – unnecessary restriction?

This is a question about the comm command on linux command line.

Why does it work on only sorted files? Why can’t it sort the files for us, then do its thing? To illustrate: If we have file1 and file2, and we wish to compare them using comm, we find that we obtain an unexpected result if either of the two files is NOT sorted. Why did the creators add this restriction of mandating the arguments to be SORTED files?

This seems inconvenient, and I feel like creating a shell script to replace this default comm.

My question is: why shouldn’t I replace it? Is it a good thing to have this restriction with comm, like how const is something that we may add in a c++ program that’s unnecessary, but deemed as good practice?

Thanks.

Advertisement

Answer

This design keeps the comm program simple, all it has to do is compare the files. If it had to sort the files as well, it would much of the complexity of the sort command, including the need for temporary files if the contents don’t fit into memory. And this would be unnecessary if the files were already sorted. The basic Unix philosophy is that each command should do one thing, or maybe a few variations of that thing based on options, and you should combine them for more complex needs.

You can use process substitution to sort the files before comparing them.

comm <(sort file1) <(sort file2)
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement