Skip to content
Advertisement

AWK script automatically removing leading 0s from String

I have a file BLACK.FUL.eg2:

10>BLACK.FUL>272/GSMA/000000>151006>01
15>004401074905590>004401074905590>B>I>0011>Insert>240/PLMN/000100>>5000-K525122-15
15>004402145955010>004402145955010>B>I>0011>Insert>240/PLMN/000100>>1200-K108534-14
15>004402146016260>004402146016360>B>I>0011>Insert>240/PLMN/000100>>1200-K-94878-14
15>004402452698630>004402452698630>B>I>0011>Insert>240/PLMN/000100>>5000-K538947-14
90>BLACK.FUL>272/GSMA/000000>151006>01>4

I’ve written this AWK script:

awk 'NR > 2 { print p } { p = $0 }' BLACK.FUL.eg2 | awk -F">" 
'{if (length($2) == 15) print substr($2,1,length($2)-1)","substr($3,1,length($3)-1)","$6","$8; 
else print $2","$3","$6","$8;}' | awk -F"," '{if ($2 == $1) print $1","$3","$4; 
else {if (length($1) > 14) {v = substr($1,9,6); t = substr($2,9,6); 
while(v <= t) print substr($2,1,8)v++substr($2,15,2)","$3","$4;} 
else {d = $1;while(d <= $2) print d++","$3","$4;}}}'

which gives me an output of:

00440107490559,0011,240/PLMN/000100
00440214595501,0011,240/PLMN/000100
440214601626,0011,240/PLMN/000100
440214601627,0011,240/PLMN/000100
440214601628,0011,240/PLMN/000100
440214601629,0011,240/PLMN/000100
440214601630,0011,240/PLMN/000100
440214601631,0011,240/PLMN/000100
440214601632,0011,240/PLMN/000100
440214601633,0011,240/PLMN/000100
440214601634,0011,240/PLMN/000100
440214601635,0011,240/PLMN/000100
440214601636,0011,240/PLMN/000100
00440245269863,0011,240/PLMN/000100

with one problem: the leading 0s of strings in field1, are automatically getting removed due to a numeric operation on them. So my actual expected output is:

00440107490559,0011,240/PLMN/000100
00440214595501,0011,240/PLMN/000100
00440214601626,0011,240/PLMN/000100
00440214601627,0011,240/PLMN/000100
00440214601628,0011,240/PLMN/000100
00440214601629,0011,240/PLMN/000100
00440214601630,0011,240/PLMN/000100
00440214601631,0011,240/PLMN/000100
00440214601632,0011,240/PLMN/000100
00440214601633,0011,240/PLMN/000100
00440214601634,0011,240/PLMN/000100
00440214601635,0011,240/PLMN/000100
00440214601636,0011,240/PLMN/000100
00440245269863,0011,240/PLMN/000100

For that I’m trying the below updated AWK script:

awk 'NR > 2 { print p } { p = $0 }' BLACK.FUL.eg2 | awk -F">" 
'{if (length($2) == 15) print substr($2,1,length($2)-1)","substr($3,1,length($3)-1)","$6","$8; 
else print $2","$3","$6","$8;}' | awk -F"," '{if ($2 == $1) print $1","$3","$4; 
else {if (length($1) > 14) {v = substr($1,9,6); t = substr($2,9,6); 
while(v <= t) print substr($2,1,8)v++substr($2,15,2)","$3","$4;} 
else {d = $1; for ( i=1;i<length($1);i++ ) if (substr($1,i++,1) == "0") 
{m=m"0"; else exit 1;}; while(d <= $2) print md++","$3","$4;}}}'

But getting an error:

awk: cmd. line:4: {m=m"0"; else exit 1;}; while(d <= $2) print   md++","$3","$4;}}}
awk: cmd. line:4:          ^ syntax error

Can you please highlight what I’m doing wrong to achieve the expected output. Modification only for my already existing AWK script will be of much help. Thanks

NOTE: The Leading 0s can be of any number of occcurence, not only 2 0s in every case as in the above example outputs.

Advertisement

Answer

since your field sizes are fixed, for the given example just change the last print statement to

$ awk ... printf "%014d,%s,%sn",d++,$3,$4}}}'

00440107490559,0011,240/PLMN/000100
00440214595501,0011,240/PLMN/000100
00440214601626,0011,240/PLMN/000100
00440214601627,0011,240/PLMN/000100
00440214601628,0011,240/PLMN/000100
00440214601629,0011,240/PLMN/000100
00440214601630,0011,240/PLMN/000100
00440214601631,0011,240/PLMN/000100
00440214601632,0011,240/PLMN/000100
00440214601633,0011,240/PLMN/000100
00440214601634,0011,240/PLMN/000100
00440214601635,0011,240/PLMN/000100
00440214601636,0011,240/PLMN/000100
00440245269863,0011,240/PLMN/000100

UPDATE

if your field size is not fixed, you can capture the length (or desired length) and use the same pattern. Since your code is too complicated, I’m going to write a proof of concept which you can embed into your script.

this is essentially your problem, increment a zero padded number and the leading zeros dropped.

$ echo 0001 | awk '{$1++; print $1}'
2

this is the proposed solution with parametric length with zero padding.

$ echo 0001 | awk '{n=length($1); $1++; printf "%0"n"sn", $1}'
0002
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement