Skip to content
Advertisement

awk merged two files with 2 columns based on string character comparison

I am a beginner and my work starts to become difficult for me. I explain my problem. I have two tables File1 and File2 (reference table).

JavaScript

I don’t have any identical column but I have some similarity between &2File1 and &2File2. File1 is from user and we want to standardize everything so I have a lot of different cases. I don´t know how to start. My idea was to remove all the “_” in my first file and “-“ in my second and compare them. I managed to do it with

JavaScript

separately but I don’t know how to combine and compare my two files.

I know also I have to think about lowercase. A nice guy give me this code above: It works for CASTOR
but How can I associate it with my gsub ???

JavaScript

Maybe it exists a better way, I am open !!!

Advertisement

Answer

Here is one shot at it in awk:

JavaScript

Any better than that would require rules on processing the underscores and dashes in the names or approximate pattern matching with approriate algorithms (see for example Levenshtein distance).

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement