Skip to content
Advertisement

AWK script: Finding number of matches that each element in Col2 has in Col1

I want to compare two columns in a file as below using AWK, can someone gives a help please?

e.g.

Col1   Col2
----   ----
 2      A
 2      D
 3      D
 3      D
 3      A
 7      N
 7      M
 1      D
 1      R

Now I want to use AWK to implement the following algorithm to find matches between those columns:

list1[] <=== Col1
list2[] <=== Col2
NewList[]
for i in col2:
   d = 0
   for j in range(1,len(col2)):
      if i == list2[j]: 
         d++
   NewList.append(list1[list2.index[i]])

Expected result:

A ==> 2  // means A matches two times to Col1
D ==> 4  // means D matches two times to Col1
....

So I want to write the above code in AWK script and I find it too complicated for me as I haven’t used it yet.

Thank you very much for your help

Advertisement

Answer

Not all that complicated, keep the count in an array indexed by the character and print the array out at the end;

awk '{cnt[$2]++} END {for(c in cnt) print c, cnt[c]}' test.txt

# A 2
# D 4
# M 1
# N 1
# R 1

{cnt[$2]++}  # For each row, get the second column and increase the
             # value of the array at that position (ie cnt['A']++)

END {for(c in cnt) print c, cnt[c]}
             # When all rows done (END), loop through the keys of the
             # array and print key and array[key] (the value)
Advertisement