Skip to content
Advertisement

compare columns from different files and print those that DO NOT match

I have two files, file1 and file2. I want to compare several columns – $1,$2 ,$3 and $4 of file1 with several columns $1,$2, $3 and $4 of file2 and print those rows of file2 that do not match any row in file1.

E.g.

file1

aaa bbb ccc 1 2 3
aaa ccc eee 4 5 6
fff sss sss 7 8 9

file2

aaa bbb ccc 1 f a
mmm nnn ooo 1 d e
aaa ccc eee 4 a b
ppp qqq rrr 4 e a
sss ttt uuu 7 m n
fff sss sss 7 5 6

I want to have as output:

mmm nnn ooo 1 d e
ppp qqq rrr 4 e a
sss ttt uuu 7 m n

I have seen questions asked here for finding those that do match and printing them, but not viceversa,those that DO NOT match.

Thank you!

Advertisement

Answer

Use the following script:

awk '{k=$1 FS $2 FS $3 FS $4} NR==FNR{a[k]; next} !(k in a)' file1 file2

k is the concatenated value of the columns 1, 2, 3 and 4, delimited by FS (see comments), and will be used as a key in a search array a later. NR==FNR is true while reading file1. I’m creating the array a indexed by k while reading file1.

For the remaining lines of input I check with !(k in a) if the index does not exists in a. If that evaluates to true awk will print that line.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement