Skip to content
Advertisement

How to sort files by word count in linux?

I have to sort files in current directory based on word count by using command wc and pipe | necessarily. What command do I have to use?

I thought I had to use a command sort | wc -w, but I failed.

Advertisement

Answer

I think this can help.

ls -1 | xargs wc -w | sort

The ls -1 will list all files of the current directory, and then pass it to xargs to use the output of the previous command as input of the command wc -w. Finally we pipe the result to sort command to order them by number of words each file contain. You can learn more about xargs here.

The output:

[amirreza@localhost test]$ ls -1
four_words
three_words
two_words
[amirreza@localhost test]$ ls -1 | xargs wc -w
 4 four_words
 3 three_words
 2 two_words
 9 total
[amirreza@localhost test]$ ls -1 | xargs wc -w | sort
 2 two_words
 3 three_words
 4 four_words
 9 total

Edit

I just figured out that my answer was not correct. Because sort command by default works character by character, so the result of sorting 2, 10, 3 will be:

10, 2, 3

Because it only checks the first character of 10 and it’s 1 so it’s less than 2 and 3. To fix it we should use numerical sort, by using n flag. Here’s how it works:

[amirreza@localhost test]$ ls -1 | xargs wc -w | sort
10 ten_words
19 total
 2 two_words
 3 three_words
 4 four_words
[amirreza@localhost test]$ ls -1 | xargs wc -w | sort -n
 2 two_words
 3 three_words
 4 four_words
10 ten_words
19 total

And just to make output more cleaner we can remove the total line and just show the file names.

[amirreza@localhost test]$ ls -1 | xargs wc -w | sort -n | awk  '{print $2}' | head -n -1
zero_word
two_words
three_words
four_words
ten_words
Advertisement