I have two files, file1 and file2. I want to compare several columns – $1,$2 ,$3 and $4 of file1 with several columns $1,$2, $3 and $4 of file2 and print those rows of file2 that do not match any row in file1.
E.g.
file1
aaa bbb ccc 1 2 3 aaa ccc eee 4 5 6 fff sss sss 7 8 9
file2
aaa bbb ccc 1 f a mmm nnn ooo 1 d e aaa ccc eee 4 a b ppp qqq rrr 4 e a sss ttt uuu 7 m n fff sss sss 7 5 6
I want to have as output:
mmm nnn ooo 1 d e ppp qqq rrr 4 e a sss ttt uuu 7 m n
I have seen questions asked here for finding those that do match and printing them, but not viceversa,those that DO NOT match.
Thank you!
Advertisement
Answer
Use the following script:
awk '{k=$1 FS $2 FS $3 FS $4} NR==FNR{a[k]; next} !(k in a)' file1 file2
k
is the concatenated value of the columns 1
, 2
, 3
and 4
, delimited by FS
(see comments), and will be used as a key in a search array a
later. NR==FNR
is true
while reading file1
. I’m creating the array a
indexed by k
while reading file1
.
For the remaining lines of input I check with !(k in a)
if the index does not exists in a
. If that evaluates to true
awk
will print that line.