Using a csv file, i will like to compare multiple columns to check if all values are the same or not.
First row are the headers First column is the label The constant values should be from column 2 to the end ( can be 100 columns ) for the example i put only 8 columns.
The purpose is to check that all values are the same. and when it is not, report
input file
JavaScript
x
Number,V2 1563,V03-1555,V4 - 294,V-05 1580,V6-1561,V7-1562,V05-1601,V9-1587
Code,4.1.06,4.1.03,4.1.06,4.1.06,4.1.06,4.1.06,4.1.06,4.1.06
Host Id,b90c27,b90c13,3.30E+65,b90c46,b90c21,b90c1f,b88a63,b90c49
SR,SR_2_MS,SR_2_MS,SR_4_MS,SR_2_MS,SR_2_MS,SR_2_MS,SR_2_MS,SR_2_MS
output desired
JavaScript
Bad code in V03-1555
Bad SR in V4 - 294
Appreciate your support
Advertisement
Answer
awk
to the rescue!
I improvised little bit. How do we know which values are correct, which are not? Popular vote, counts the occurrences and assumes majority is right. As a side benefit, if all values are different as in your “Host Id” row, nothing is reported
JavaScript
$ awk -F, 'NR==1 {split($0,h); next}
{delete r;
for(i=2;i<=NF;i++) {r[$i]++; idx[$i]=i}
max=0;
for(k in r) if(max<r[k]) max=r[k];
if(length(r)>1)
for(k in r)
if(r[k]!=max)
print "Bad " $1 " in " h[idx[k]] " -> " k}' file
returns
JavaScript
Bad Code in V03-1555 -> 4.1.03
Bad SR in V4 - 294 -> SR_4_MS
you can remove the values printed, which I put for verification.