file1:
JavaScript
x
chr1 14361 14829 NR_024540_0_r_DDX11L1,WASH7P_468
chr1 14969 15038 NR_024540_1_r_WASH7P_69
chr1 15795 15947 NR_024540_2_r_WASH7P_152
chr1 16606 16765 NR_024540_3_r_WASH7P_15
chr1 16857 17055 NR_024540_4_r_WASH7P_198
and file2:
JavaScript
NR_024540 11
I need find match file2
in file1
and print whole file1 + second column of file2
So ouptut is:
JavaScript
chr1 14361 14829 NR_024540_0_r_DDX11L1,WASH7P_468 11
chr1 14969 15038 NR_024540_1_r_WASH7P_69 11
chr1 15795 15947 NR_024540_2_r_WASH7P_152 11
chr1 16606 16765 NR_024540_3_r_WASH7P_15 11
chr1 16857 17055 NR_024540_4_r_WASH7P_198 11
My solution is very slow in bash:
JavaScript
#!/bin/bash
while read line; do
c=$(echo $line | awk '{print $1}')
d=$(echo $line | awk '{print $2}')
grep $c file1 | awk -v line="$d" -v OFS="t" '{print $1,$2,$3,$4"_"line}' >> output
done < file2
I am prefer FASTER any bash or awk solution. Output can be modified, but need keep all the informations (order of column can be different).
EDIT:
Right now it looks like fastest solution according @chepner:
JavaScript
#!/bin/bash
while read -r c d; do
grep $c file1 | awk -v line="$d" -v OFS="t" '{print $1,$2,$3,$4"_"line}'
done < file2 > output
Advertisement
Answer
Another solution using join
and sed
, Under the assumption that file1
and file2
are sorted
JavaScript
join <(sed -r 's/[^ _]+_[^_]+/& &/' file1) file2 -1 4 -2 1 -o "1.1 1.2 1.3 1.5 2.2" > output
If the output order doesn’t matter, to use awk
JavaScript
awk 'FNR==NR{d[$1]=$2; next}
{split($4,v,"_"); key=v[1]"_"v[2]; if(key in d) print $0, d[key]}
' file2 file1
you get,
chr1 14361 14829 NR_024540_0_r_DDX11L1,WASH7P_468 11 chr1 14969 15038 NR_024540_1_r_WASH7P_69 11 chr1 15795 15947 NR_024540_2_r_WASH7P_152 11 chr1 16606 16765 NR_024540_3_r_WASH7P_15 11 chr1 16857 17055 NR_024540_4_r_WASH7P_198 11