I am trying to take a file containing a list and count how many times items in that list occur in a target file. something like:
list.txt blonde red black target.txt bob blonde male sam blonde female desired_output.txt blonde 2 red 0 black 0
I have coopted the following code to get the values that are present in target.txt:
awk '{count[$2]++} END {for (word in count) print word, count[word]}' target.txt
But the output does not include the desired items that are in the liist.txt but not the target.txt
current_output.txt blonde 2
I have tried a few things to get this working including:
awk '{word[$1]++;next;count[$2]++} END {for (word in count) print word, count[word]}' list.txt target.txt
However, I have had no success.
Could anyone help me make it so that this awk statement reads the key.txt file? any explanation of the code would also be much appreciated. Thanks!
Advertisement
Answer
awk ' NR==FNR{a[$0]; next} { for(i=1; i<=NF; i++){ if ($i in a){ a[$i]++ } } } END{ for(key in a){ printf "%s %dn", key, a[key] } } ' list.txt target.txt
NR==FNR{a[$0]; next}
The conditionNR==FNR
is only true for the first file, so the keys of arraya
are lines oflist.txt
.for(i=1; i<=NF; i++)
Now for the second file, this loops over all its fields.if ($i in a){ a[$i]++ }
This checks if the field$i
is present as a key in the arraya
. If yes, the value (initially zero) associated with that key is incremented.
At the
END
, we just print thekey
followed by the number of occurrencesa[key]
and a newline (n
).
Output:
blonde 2 red 0 black 0
Notes:
Because of
%d
, theprintf
statement forces the conversion ofa[key]
to an integer in case it is still unset. The whole statement could be replaced by a simplerprint key, a[key]+0
. I missed that when writing the answer, but now you know two ways of doing the same thing. 😉In your attempt you were, for some reason, only addressing field 2 (
$2
), ignoring other columns.