substitute consecutive tabs for “tNAt”

Question

Have a badly formatted tsv file with empty fields all over the place. I wish to fill these empty spaces with "NA" on linux. I tried awk '{gsub("tt","tNAt"); print$0)' but that only substitutes one empty space to NA instance. Chaining the command awk '{gsub("tt","tNAt"); print$0)|awk '{gsub("tt","tNAt"); print$0) does two substitutions per line - but not particularly helpful if I have

Accepted Answer

Its a bit complex since you have to handle newlines empty fields, end of line empty fields and potentially successive empty fields. I could not achieve something with sed, it&#8217;s probably insane. But with awk this seems to work:$ cat test.txta       c   d   e    g   h   i   jk   l   m   n   p           s   t        w   x   $ awk -F$'t' '{for(i=1;i<=NF;++i){if($i==""){printf "NA"}else{printf $i} if(i<NF)printf "t"} printf "n"}' test.txt a   NA  c   d   eNA  g   h   i   jk   l   m   n   NAp   NA  NA  s   tNA  NA  w   x   NABeware copy paste, the tabs will probably be transformed to spaces&#8230; By the way I searched a solution for the CSV files, and adapted it from this thread 😉 where you can see that the most readable option is the awk one.

Advertisement

Answer