Skip to content
Advertisement

Replacing a string in the beginning of some rows in two columns with another string in linux

I have a tab separated text file. In column 1 and 2 there are family and individual ids that start with a character followed by number as follow:

HG1005 HG1005
HG1006 HG1006
HG1007 HG1007
NA1008 NA1008
NA1009 NA1009

I would like to replace NA with HG in both the columns. I am very new to linux and tried the following code and some others:

awk '{sub("NA","HG",$2)';print}' input file > output file

Any help is highly appreciated.

Advertisement

Answer

The $2 in your call to sub only replaces the first occurrence of NA in the second field.

Note that while sed is more typical for such scenarios:

sed 's/NA/HG/g' inputfile > outputfile

you can still use awk:

awk '{gsub("NA","HG")}1' inputfile > outputfile

See the online demo.

Since there is no input variable in gsub (that performs multiple search and replaces) the default $0 is used, i.e. the whole record, the current line, and the code above is equal to awk '{gsub("NA","HG",$0)}1' inputfile > outputfile.

The 1 at the end triggers printing the current record, it is a shorter variant of print.

Advertisement