Working in linux/shell env, how can I accomplish the following:
text file 1 contains:
1 2 3 4 5
text file 2 contains:
6 7 1 2 3 4
I need to extract the entries in file 2 which are not in file 1. So ‘6’ and ‘7’ in this example.
How do I do this from the command line?
many thanks!
Advertisement
Answer
$ awk 'FNR==NR {a[$0]++; next} !($0 in a)' file1 file2 6 7
Explanation of how the code works:
- If we’re working on file1, track each line of text we see.
- If we’re working on file2, and have not seen the line text, then print it.
Explanation of details:
FNR
is the current file’s record numberNR
is the current overall record number from all input filesFNR==NR
is true only when we are reading file1$0
is the current line of texta[$0]
is a hash with the key set to the current line of texta[$0]++
tracks that we’ve seen the current line of text!($0 in a)
is true only when we have not seen the line text- Print the line of text if the above pattern returns true, this is the default awk behavior when no explicit action is given