Skip to content
Advertisement

How to check for occurrences of indexes in a file onto another by column and print out the result [closed]

So i have a file1.dat with indexes i need to check (one by line) and a file2.csv file where i wanna count the occurrences of each of those indexes by column and then print out the result for each index. Exemple:

file1.dat:

74568
75842
76822
77458

file2.csv:

"1470009615","62353","1","y","52669","74568","php"
"1470009617","31173","1","y","54725","31173","php"
"1470009617","16293","1","n","58892","75842","php"
"1470009636","50795","1","y","47004","50795","php"

So what i want, in this case, is to count how many times each of the indexes in file1.dat appear in column 6 and generate an output (file3.dat) like this:

74568    1
75842    1
76822    0
77458    0

I know the code to count by column with awk is something like this:

awk -F, '$6==(index)'

But how can i make it take the index from the list and print out the result properly? Python scripts would help too.

Advertisement

Answer

awk -F, 'FNR==NR{gsub(/"/,"",$6);A[$6]++;next}{print $1,A[$1]+0}' file2.csv file1.dat

Input

$ cat file1.dat
74568
75842
76822
77458

$ cat file2.csv
"1470009615","62353","1","y","52669","74568","php"
"1470009617","31173","1","y","54725","31173","php"
"1470009617","16293","1","n","58892","75842","php"
"1470009636","50795","1","y","47004","50795","php"

Output

$ awk -F, 'FNR==NR{gsub(/"/,"",$6);A[$6]++;next}{print $1,A[$1]+0}' file2.csv file1.dat
74568 1
75842 1
76822 0
77458 0
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement