So i have a file1.dat with indexes i need to check (one by line) and a file2.csv file where i wanna count the occurrences of each of those indexes by column and then print out the result for each index. Exemple:
file1.dat:
74568 75842 76822 77458
file2.csv:
"1470009615","62353","1","y","52669","74568","php" "1470009617","31173","1","y","54725","31173","php" "1470009617","16293","1","n","58892","75842","php" "1470009636","50795","1","y","47004","50795","php"
So what i want, in this case, is to count how many times each of the indexes in file1.dat appear in column 6 and generate an output (file3.dat) like this:
74568 1 75842 1 76822 0 77458 0
I know the code to count by column with awk is something like this:
awk -F, '$6==(index)'
But how can i make it take the index from the list and print out the result properly? Python scripts would help too.
Advertisement
Answer
awk -F, 'FNR==NR{gsub(/"/,"",$6);A[$6]++;next}{print $1,A[$1]+0}' file2.csv file1.dat
Input
$ cat file1.dat 74568 75842 76822 77458 $ cat file2.csv "1470009615","62353","1","y","52669","74568","php" "1470009617","31173","1","y","54725","31173","php" "1470009617","16293","1","n","58892","75842","php" "1470009636","50795","1","y","47004","50795","php"
Output
$ awk -F, 'FNR==NR{gsub(/"/,"",$6);A[$6]++;next}{print $1,A[$1]+0}' file2.csv file1.dat 74568 1 75842 1 76822 0 77458 0