Skip to content
Advertisement

How to remove double quotes in a specific column by using sub() in AWK

My sample data is

cat > myfile
"a12","b112122","c12,d12"
a13,887988,c13,d13
a14,b14121,c79,d13

when I try to remove from colum 2 by

awk -F, 'BEGIN { OFS = FS } $2 ~ /"/ { sub(/"/, "", $2) }1' myfile 
"a12",b112122","c12,d12"
a13,887988,c13,d13
a14,b14121,c79,d13

It only remove only 1 comma, instead of b112122 i am getting b112122″

how to remove all ” in 2nd column

Advertisement

Answer

From the documentation:

Search target, which is treated as a string, for the leftmost, longest substring matched by the regular expression regexp.[…] Return the number of substitutions made (zero or one).

It is quite clear that the function sub is doing at most one single replacement and does not replace all occurences.

Instead, use gsub:

Search target for all of the longest, leftmost, nonoverlapping matching substrings it can find and replace them with replacement. The ‘g’ in gsub() stands for “global,” which means replace everywhere.

So you can add a ‘g’ to your line and it works fine:

awk -F, 'BEGIN { OFS = FS } $2 ~ /"/ { gsub(/"/, "", $2) }1' myfile 
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement