Skip to content
Advertisement

Merge multiple files preserving the original sequence in unix

I have multiple (more than 100) text files in the directory such as

files_1_100.txt
files_101_200.txt

The contents of the file are name of some variables like files_1_100.txt contains some variables names between 1 to 100

"var.2"
"var.5"
"var.15"

Similarly files_201_300.txt contains some variables between 101 to 200

"var.203"
"var.227"
"var.285"

and files_1001_1100.txt as

"var.1010"
"var.1006"
"var.1025"

I can merge them using the command

cat files_*00.txt > ../all_files.txt

However, the contents of files does not follow that in the parent files. For example all_files.txt shows

"var.1010"
"var.1006"
"var.1025"
"var.1"
"var.5"
"var.15"
"var.203"
"var.227"
"var.285"

So, how can I ensure that contents of files_1_100.txt comes first, followed by files_201_300.txt and then files_1001_1100.txt such that the contents of the all_files.txt is

"var.1"
"var.5"
"var.15"
"var.203"
"var.227"
"var.285"
"var.1010"
"var.1006"
"var.1025"

Advertisement

Answer

Let me try it out, but I think that this will work:

ls file*.txt | sort -n -t _ -k2 -k3 | xargs cat

The idea is to take your list of files and sort them and then pass them to the cat command.

The sort uses several options:

  • -n – use a numeric sort rather than alphabetic
  • -t _ – divide the input (the filename) into fields using the underscore character
  • -k2 -k3 – sort first by the 2nd field and then by the 3rd field (the 2 numbers)

You have said that your files are named file_1_100.txt, file_101_201.txt, etc. If that means (as it seems to indicate) that the first numeric “chunk” is always unique then you can leave off the -k3 flag. That flag is needed only if you will end up, for instance, with file_100_2.txt and file_100_10.txt where you have to look at the 2nd numeric “chunk” to determine the preferred order.

Depending on the number of files you are working with you may find that specifying the glob (file*.txt) may overwhelm the shell and cause errors about the line being too long. If that’s the case you could do it like this:

ls | grep '^file.*.txt$' | sort -n -t _ -k2 -k3 | xargs cat
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement