Skip to content
Advertisement

Compare multiple columns for each row

Using a csv file, i will like to compare multiple columns to check if all values are the same or not.

First row are the headers First column is the label The constant values should be from column 2 to the end ( can be 100 columns ) for the example i put only 8 columns.

The purpose is to check that all values are the same. and when it is not, report

input file

Number,V2 1563,V03-1555,V4 - 294,V-05 1580,V6-1561,V7-1562,V05-1601,V9-1587
Code,4.1.06,4.1.03,4.1.06,4.1.06,4.1.06,4.1.06,4.1.06,4.1.06
Host Id,b90c27,b90c13,3.30E+65,b90c46,b90c21,b90c1f,b88a63,b90c49
SR,SR_2_MS,SR_2_MS,SR_4_MS,SR_2_MS,SR_2_MS,SR_2_MS,SR_2_MS,SR_2_MS

output desired

Bad code in V03-1555
Bad SR in V4 - 294

Appreciate your support

Advertisement

Answer

awk to the rescue!

I improvised little bit. How do we know which values are correct, which are not? Popular vote, counts the occurrences and assumes majority is right. As a side benefit, if all values are different as in your “Host Id” row, nothing is reported

$ awk -F, 'NR==1 {split($0,h); next} 
                 {delete r; 
                  for(i=2;i<=NF;i++) {r[$i]++; idx[$i]=i}
                  max=0;
                  for(k in r) if(max<r[k]) max=r[k];
                  if(length(r)>1) 
                    for(k in r) 
                      if(r[k]!=max) 
                        print "Bad " $1 " in " h[idx[k]] " -> " k}' file

returns

Bad Code in V03-1555 -> 4.1.03
Bad SR in V4 - 294 -> SR_4_MS

you can remove the values printed, which I put for verification.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement