Skip to content
Advertisement

AWK: Comparing substrings from two files and write to third file

I’m trying to compare two different files, let’s say “file1” and “file2”, in this way. If the substring of characters i.e 5 characters at position (8 to 12) matches in both files – file1 and file2, then remove that matching row from file 1. Finally, write the output to file3.(output contains the remaining rows which are not matching with file 2) My output is the non matching rows of file1. Output (file3) = File1 – File2

File1
-----
aqcdfdf**45555**78782121
axcdfdf**45555**75782321
aecdfdf**75555**78782221
aqcdfdf**95555**78782121

File2
-----
aqcdfdf**45555**78782121
axcdfdf**25555**75782321

File3
-----
aecdfdf**75555**78782221
aqcdfdf**95555**78782121

I tried awk but i need some thing which looks at substring of the two files, since there are no delimiters in my files. $ awk ‘FNR==NR {a[$1]; next} $1 in a’ f1 f2 > file3

Advertisement

Answer

Could you please try following, written and tested with shown samples in GNU awk. Once happy with results on terminal then redirect output of following command to > file3(append > file3 to following command).

awk '{str=substr($0,8,5)} FNR==NR{a[str];next} !(str in a)' file2 file1

Explanation: Adding detailed explanation for above.

awk '                  ##Starting awk program from here.
{
  str=substr($0,8,5)   ##Creating str which has sub-string of current line from 8th to 12th character.
}
FNR==NR{               ##Checking condition FNR==NR which will run when Input_file2 is being read.
  a[str]               ##Creating array a with index of str here.
  next                 ##next will skip all further statements from here.
}
!(str in a)            ##Checking condition if str is NOT present in a then print that line from Input_file1.
' file2 file1          ##Mentioning Input_file names here.
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement