I am trying to filter a text file with columns based on two conditions. Due to the size of the file, I cannot use the column numbers (as there are thousands and are unnumbered) but need to use the column names. I have searched and tried to come up with multiple ways to do this but nothing is returned to the command line.
Here are a few things I have tried:
awk '($colname1==2 && $colname2==1) { count++ } END { print count }' file.txt
to filter out the columns based on their conditions
and
head -1 file.txt | tr 't' | cat -n | grep "COLNAME
to try and return the possible column number related to the column.
An example file would be:
ID ad bd 1 a fire 2 b air 3 c water 4 c water 5 d water 6 c earth
Output would be: 2 (count of ad=c and bd=water)
Advertisement
Answer
with your input file and the implied conditions this should work
$ awk -v c1='ad' -v c2='bd' 'NR==1{n=split($0,h); for(i=1;i<=n;i++) col[h[i]]=i} $col[c1]=="c" && $col[c2]=="water"{count++} END{print count+0}' file 2
or you can replace c1 and c2 with the values in the script as well.
to find the column indices you can run
$ awk -v cols='ad bd' 'BEGIN{n=split(cols,c); for(i=1;i<=n;i++) colmap[c[i]]} NR==1{for(i=1;i<=NF;i++) if($i in colmap) print $i,i; exit}' file ad 2 bd 3
or perhaps with this chain
$ sed 1q file | tr -s ' ' \n | nl | grep -E 'ad|bd' 2 ad 3 bd
although may have false positives due to regex match…
You can rewrite the awk
to be more succinct
$ awk -v cols='ad bd' '{while(++i<=NF) if(FS cols FS ~ FS $i FS) print $i,i; exit}' file ad 2 bd 3