I’m trying to compare two different files, let’s say “file1” and “file2”, in this way. If the substring of characters i.e 5 characters at position (8 to 12) matches in both files – file1 and file2, then remove that matching row from file 1. Finally, write the output to file3.(output contains the remaining rows which are not matching with file 2) My output is the non matching rows of file1. Output (file3) = File1 – File2
File1 ----- aqcdfdf**45555**78782121 axcdfdf**45555**75782321 aecdfdf**75555**78782221 aqcdfdf**95555**78782121 File2 ----- aqcdfdf**45555**78782121 axcdfdf**25555**75782321 File3 ----- aecdfdf**75555**78782221 aqcdfdf**95555**78782121
I tried awk but i need some thing which looks at substring of the two files, since there are no delimiters in my files. $ awk ‘FNR==NR {a[$1]; next} $1 in a’ f1 f2 > file3
Advertisement
Answer
Could you please try following, written and tested with shown samples in GNU awk
. Once happy with results on terminal then redirect output of following command to > file3
(append > file3
to following command).
awk '{str=substr($0,8,5)} FNR==NR{a[str];next} !(str in a)' file2 file1
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here. { str=substr($0,8,5) ##Creating str which has sub-string of current line from 8th to 12th character. } FNR==NR{ ##Checking condition FNR==NR which will run when Input_file2 is being read. a[str] ##Creating array a with index of str here. next ##next will skip all further statements from here. } !(str in a) ##Checking condition if str is NOT present in a then print that line from Input_file1. ' file2 file1 ##Mentioning Input_file names here.