Skip to content
Advertisement

substitute consecutive tabs for “tNAt”

Have a badly formatted tsv file with empty fields all over the place. I wish to fill these empty spaces with “NA” on linux.

I tried awk '{gsub("tt","tNAt"); print$0)' but that only substitutes one empty space to NA instance. Chaining the command awk '{gsub("tt","tNAt"); print$0)|awk '{gsub("tt","tNAt"); print$0) does two substitutions per line – but not particularly helpful if I have many columns to deal with.

Is there a faster (neater) way to do this?

Advertisement

Answer

Its a bit complex since you have to handle newlines empty fields, end of line empty fields and potentially successive empty fields. I could not achieve something with sed, it’s probably insane. But with awk this seems to work:

JavaScript

Beware copy paste, the tabs will probably be transformed to spaces… By the way I searched a solution for the CSV files, and adapted it from this thread 😉 where you can see that the most readable option is the awk one.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement