Skip to content
Advertisement

Filtering on a condition using the column names and not numbers

I am trying to filter a text file with columns based on two conditions. Due to the size of the file, I cannot use the column numbers (as there are thousands and are unnumbered) but need to use the column names. I have searched and tried to come up with multiple ways to do this but nothing is returned to the command line.

Here are a few things I have tried:

awk '($colname1==2 && $colname2==1) { count++ } END { print count }' file.txt to filter out the columns based on their conditions

and head -1 file.txt | tr 't' | cat -n | grep "COLNAME to try and return the possible column number related to the column.

An example file would be:

ID  ad   bd

1   a   fire
2   b   air
3   c   water
4   c   water
5   d   water
6   c   earth

Output would be: 2 (count of ad=c and bd=water)

Advertisement

Answer

with your input file and the implied conditions this should work

$ awk -v c1='ad' -v c2='bd' 'NR==1{n=split($0,h); for(i=1;i<=n;i++) col[h[i]]=i} 
                             $col[c1]=="c" && $col[c2]=="water"{count++} END{print count+0}' file

2

or you can replace c1 and c2 with the values in the script as well.

to find the column indices you can run

$ awk -v cols='ad bd' 'BEGIN{n=split(cols,c); for(i=1;i<=n;i++) colmap[c[i]]} 
                       NR==1{for(i=1;i<=NF;i++) if($i in colmap) print $i,i; exit}' file

ad 2
bd 3

or perhaps with this chain

$ sed 1q file | tr -s ' ' \n | nl | grep -E 'ad|bd'

     2  ad
     3  bd

although may have false positives due to regex match…

You can rewrite the awk to be more succinct

$ awk -v cols='ad bd' '{while(++i<=NF) if(FS cols FS ~ FS $i FS) print $i,i; 
                        exit}' file

ad 2
bd 3 
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement