Skip to content
Advertisement

extracting unique values between 2 sets/files

Working in linux/shell env, how can I accomplish the following:

text file 1 contains:

1
2
3
4
5

text file 2 contains:

6
7
1
2
3
4

I need to extract the entries in file 2 which are not in file 1. So ‘6’ and ‘7’ in this example.

How do I do this from the command line?

many thanks!

Advertisement

Answer

$ awk 'FNR==NR {a[$0]++; next} !($0 in a)' file1 file2
6
7

Explanation of how the code works:

  • If we’re working on file1, track each line of text we see.
  • If we’re working on file2, and have not seen the line text, then print it.

Explanation of details:

  • FNR is the current file’s record number
  • NR is the current overall record number from all input files
  • FNR==NR is true only when we are reading file1
  • $0 is the current line of text
  • a[$0] is a hash with the key set to the current line of text
  • a[$0]++ tracks that we’ve seen the current line of text
  • !($0 in a) is true only when we have not seen the line text
  • Print the line of text if the above pattern returns true, this is the default awk behavior when no explicit action is given
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement