Skip to content
Advertisement

Looking to remove a set of characters from a .txt file

I am looking for a way to remove the content from one .txt file based on the other.

For example, I have a file.txt with 2000 character that are random and not sorted. I have another file importantfile.txt with 2016 characters that have the same characters that file.txt has as well as 16 other characters randomly placed in.

Is there any way to remove the characters in file.txt from importantfile.txt to find the 16 character string.

Some errors I found that in the diff command is that It would print the whole string because It was considered one word diff file.txt importantfile.txt would return w881lYoi8042aKGfwj7EjenViinsmbmnWIHJMZ2T9L40KiLr4x485TM3gKmc1Ig8n6VVW82iqjxypCp19sXIMisX4HIkp54lVohqKSuLjjuns91GiEwtTsvN0zhn6c9GZC2GqUKLsy9v1SvSKvdSPBmIJtNoSwr65BBGqLQ1LdHg93kfZoCq5NPxkaYjIyppzYaczGlwZBrsKyjbTEI5B1aWuw6g9xBZ1viussKRP5C5Pq5yO14P8xBDHGugo93mwf7rsjNehNuxDSAt shortened for obvious reasons, but the start of both strings would be w881l..... I also tried java script, using the importantfile.replace("file",""); code but it returns the whole string as well. Anything helps, thanks

Advertisement

Answer

If I’m understanding correctry, how about:

awk '
NR==FNR{str1 = $0; next} {str2 = $0}
END {
    for (i = j = 1; j <= length(str2); ) {
        if (substr(str1, i, 1) == substr(str2, j, 1)) {
            incr = 1
        } else {
            incr = 0
            printf "%s", substr(str2, j, 1)
        }
        i+=incr; j++
    }
    print ""
}' file.txt importantfile.txt

Output:

d5
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement