grep issues when using two files – I’ve tried everything

Question

I have two files (recode and reads) that were built and saved with nano command and I want to compare what has on recode to reads and extract the lines in reads that overlaps. I have been trying to create a when loop with the previous logic on mind, but without success so far. The output data is not matching

Accepted Answer

Without seeing the expected output it&#8217;s a guess but it sounds like this is what you&#8217;re trying to do:$ awk -F: 'NR==FNR{a[$0];next} $NF in a{c=3} c&&c--' recode.txt reads.fastq@NB500931:113:HW53WBGX2:1:11101:8246:1049 1:N:0:ATCACGAC+AAGGTTCACTTGTNAGACACGATGCAGAGAATTAGCTGTTTGATGCCTATCTTCCCAACTCAGAGGCAAGCTGCCCAAAGGC+No shell loop required (see why-is-using-a-shell-loop-to-process-text-considered-bad-practice for SOME of the reasons why that matters), just saves the values from recode.txt as array indices and then when reading reads.fastq if the last :-separated field is an index of the array (i.e. existed in recode.txt) then set a counter to 3 and then print every line while the counter is greater than zero, decrementing the counter each time (see printing-with-sed-or-awk-a-line-following-a-matching-pattern for other examples of printing text starting from a match).To save each found record in a file based on the string name in that final field as it looks like you might be trying to do in your shell loop would be:awk -F: '    NR==FNR  { a[$0]; next }    $NF in a { c=3; close(out); out=$NF"_sorted.fastq" }    c&&c--   { print >> out }' recode.txt reads.fastqNote that that just reads &#8220;reads.fastq&#8221; once total, not once per line of &#8220;recode.txt&#8221; as your shell loop was doing, so you can expect a vast performance improvement from that aspect alone.Finally &#8211; if recode.txt is just a list of ALL of the final fields that exist in reads.fastq then you simply don&#8217;t need it, this is all you need to split reads.fastq into separate files of 3 lines per record named based on the value after the last : on each line that starts with @:awk -F: '    /^@/   { c=3; close(out); out=$NF"_sorted.fastq" }    c&&c-- { print >> out }' reads.fastq

Advertisement

Answer