Skip to content
Advertisement

Compare two text files line by line, finding differences but ignoring numerical values differences

I’m working on a bash script to compare two similar text files line by line and find the eventual differences between each line of the files, I should point the difference and tell in which line the difference is, but I should ignore the numerical values in this comparison.

Example:

Process is running; process found : 12603 process is listening on port 1200
Process is running; process found : 43023 process is listening on port 1200

In the example above, the script shouldn’t find any difference since it’s just the process id and it changes all the time.

But otherwise I want it to notify me of the differences between the lines.

Example:

Process is running; process found : 12603 process is listening on port 1200
Process is not running; process found : 43023 process is not listening on port 1200

I already have a working script to find the differences, and i’ve used the following function to find the difference and ignore the numerical values, but it’s not working perfectly, Any suggestions ?

    COMPARE_FILES()
{
    awk 'NR==FNR{a[FNR]=$0;next}$0!~a[FNR]{print $0}' $1 $2
}

Where $1 and $2 are the two files to compare.

Advertisement

Answer

Would you please try the following:

COMPARE_FILES() {
    awk '
    NR==FNR {a[FNR]=$0; next}
    {
        b=$0; gsub(/[0-9]+/,"",b)
        c=a[FNR]; gsub(/[0-9]+/,"",c)
        if (b != c) {printf "< %sn> %sn", $0, a[FNR]}
    }' "$1" "$2"
}
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement