Using a csv file, i will like to compare multiple columns to check if all values are the same or not.
First row are the headers First column is the label The constant values should be from column 2 to the end ( can be 100 columns ) for the example i put only 8 columns.
The purpose is to check that all values are the same. and when it is not, report
input file
Number,V2 1563,V03-1555,V4 - 294,V-05 1580,V6-1561,V7-1562,V05-1601,V9-1587 Code,4.1.06,4.1.03,4.1.06,4.1.06,4.1.06,4.1.06,4.1.06,4.1.06 Host Id,b90c27,b90c13,3.30E+65,b90c46,b90c21,b90c1f,b88a63,b90c49 SR,SR_2_MS,SR_2_MS,SR_4_MS,SR_2_MS,SR_2_MS,SR_2_MS,SR_2_MS,SR_2_MS
output desired
Bad code in V03-1555 Bad SR in V4 - 294
Appreciate your support
Advertisement
Answer
awk
to the rescue!
I improvised little bit. How do we know which values are correct, which are not? Popular vote, counts the occurrences and assumes majority is right. As a side benefit, if all values are different as in your “Host Id” row, nothing is reported
$ awk -F, 'NR==1 {split($0,h); next} {delete r; for(i=2;i<=NF;i++) {r[$i]++; idx[$i]=i} max=0; for(k in r) if(max<r[k]) max=r[k]; if(length(r)>1) for(k in r) if(r[k]!=max) print "Bad " $1 " in " h[idx[k]] " -> " k}' file
returns
Bad Code in V03-1555 -> 4.1.03 Bad SR in V4 - 294 -> SR_4_MS
you can remove the values printed, which I put for verification.