awk merged two files with 2 columns based on string character comparison

Question

I am a beginner and my work starts to become difficult for me. I explain my problem. I have two tables File1 and File2 (reference table). I don’t have any identical column but I have some similarity between &2File1 and &2File2. File1 is from user and we want to standardize everything so I have a lot o…

Accepted Answer

Here is one shot at it in awk:$ awk 'BEGIN { FS=", *"; OFS="," }NR==FNR {    a[tolower($2)]=$0    next}{    for(i in a)               # for every city in file2        if(tolower($2)~i) {   # compare it to a record from file1            print $0,a[i]     # print it if there is a match            next        }}1' file2 file1num, Name1, 1_1_busteni,R_001,  BUSTENI13, 23_Doicesti40, 2_AR_Moreni,R_003,  MORENI47, 2_AR_Moreni_SUD,R_003,  MORENI55, Petrolul_Romanesc,R_005,  ROMANESC62, castor,R_006,  CASTORAny better than that would require rules on processing the underscores and dashes in the names or approximate pattern matching with approriate algorithms (see for example Levenshtein distance).

Advertisement

Answer