Finding Duplicate rows based on a column in Unix File

Question

I have a file of about 1 Million records. I need to extract the records which have different FName and LName for id. Input File The result that I want to see Any AWK or Sed command or script can help? Thanks Answer Using GNU awk for arrays of arrays: or if your input file is sorted by &#8220;id&#8221; as

Accepted Answer

Using GNU awk for arrays of arrays:$ awk -F, '    { vals[$5][$6 FS $8] = $0 }    END {        for ( id in vals ) {            if ( length(vals[id]) > 1 ) {                for (name in vals[id]) {                    print vals[id][name]                }            }        }    }' fileAP,abc@gmail.com,xyz1,abc1,123,Ram,,Kumar,phn1,fax1,url1AP,abc2@gmail.com,xyz2,abc2,123,Shyam,,Kumar,phn2,fax2,url1or if your input file is sorted by &#8220;id&#8221; as shown in your sample input then with any awk and without storing the input file in memory:$ cat tst.awkBEGIN { FS=OFS="," }NR > 1 {    id   = $5    name = $6 FS $8    if ( id == prevId ) {        if ( name != prevName ) {            if ( firstRec != "" ) {                print firstRec                firstRec = ""            }            print        }    }    else {        firstRec = $0    }    prevId   = id    prevName = name}$ awk -f tst.awk fileAP,abc@gmail.com,xyz1,abc1,123,Ram,,Kumar,phn1,fax1,url1AP,abc2@gmail.com,xyz2,abc2,123,Shyam,,Kumar,phn2,fax2,url1

Advertisement

Answer