I have a tab separated text file. In column 1 and 2 there are family and individual ids that start with a character followed by number as follow:
HG1005 HG1005 HG1006 HG1006 HG1007 HG1007 NA1008 NA1008 NA1009 NA1009
I would like to replace NA with HG in both the columns. I am very new to linux and tried the following code and some others:
awk '{sub("NA","HG",$2)';print}' input file > output file
Any help is highly appreciated.
Advertisement
Answer
The $2
in your call to sub
only replaces the first occurrence of NA
in the second field.
Note that while sed
is more typical for such scenarios:
sed 's/NA/HG/g' inputfile > outputfile
you can still use awk
:
awk '{gsub("NA","HG")}1' inputfile > outputfile
See the online demo.
Since there is no input variable in gsub
(that performs multiple search and replaces) the default $0
is used, i.e. the whole record, the current line, and the code above is equal to awk '{gsub("NA","HG",$0)}1' inputfile > outputfile
.
The 1
at the end triggers printing the current record, it is a shorter variant of print
.