I have a file BLACK.FUL.eg2:
10>BLACK.FUL>272/GSMA/000000>151006>01 15>004401074905590>004401074905590>B>I>0011>Insert>240/PLMN/000100>>5000-K525122-15 15>004402145955010>004402145955010>B>I>0011>Insert>240/PLMN/000100>>1200-K108534-14 15>004402146016260>004402146016360>B>I>0011>Insert>240/PLMN/000100>>1200-K-94878-14 15>004402452698630>004402452698630>B>I>0011>Insert>240/PLMN/000100>>5000-K538947-14 90>BLACK.FUL>272/GSMA/000000>151006>01>4
I’ve written this AWK script:
awk 'NR > 2 { print p } { p = $0 }' BLACK.FUL.eg2 | awk -F">" '{if (length($2) == 15) print substr($2,1,length($2)-1)","substr($3,1,length($3)-1)","$6","$8; else print $2","$3","$6","$8;}' | awk -F"," '{if ($2 == $1) print $1","$3","$4; else {if (length($1) > 14) {v = substr($1,9,6); t = substr($2,9,6); while(v <= t) print substr($2,1,8)v++substr($2,15,2)","$3","$4;} else {d = $1;while(d <= $2) print d++","$3","$4;}}}'
which gives me an output of:
00440107490559,0011,240/PLMN/000100 00440214595501,0011,240/PLMN/000100 440214601626,0011,240/PLMN/000100 440214601627,0011,240/PLMN/000100 440214601628,0011,240/PLMN/000100 440214601629,0011,240/PLMN/000100 440214601630,0011,240/PLMN/000100 440214601631,0011,240/PLMN/000100 440214601632,0011,240/PLMN/000100 440214601633,0011,240/PLMN/000100 440214601634,0011,240/PLMN/000100 440214601635,0011,240/PLMN/000100 440214601636,0011,240/PLMN/000100 00440245269863,0011,240/PLMN/000100
with one problem: the leading 0s of strings in field1, are automatically getting removed due to a numeric operation on them. So my actual expected output is:
00440107490559,0011,240/PLMN/000100 00440214595501,0011,240/PLMN/000100 00440214601626,0011,240/PLMN/000100 00440214601627,0011,240/PLMN/000100 00440214601628,0011,240/PLMN/000100 00440214601629,0011,240/PLMN/000100 00440214601630,0011,240/PLMN/000100 00440214601631,0011,240/PLMN/000100 00440214601632,0011,240/PLMN/000100 00440214601633,0011,240/PLMN/000100 00440214601634,0011,240/PLMN/000100 00440214601635,0011,240/PLMN/000100 00440214601636,0011,240/PLMN/000100 00440245269863,0011,240/PLMN/000100
For that I’m trying the below updated AWK script:
awk 'NR > 2 { print p } { p = $0 }' BLACK.FUL.eg2 | awk -F">" '{if (length($2) == 15) print substr($2,1,length($2)-1)","substr($3,1,length($3)-1)","$6","$8; else print $2","$3","$6","$8;}' | awk -F"," '{if ($2 == $1) print $1","$3","$4; else {if (length($1) > 14) {v = substr($1,9,6); t = substr($2,9,6); while(v <= t) print substr($2,1,8)v++substr($2,15,2)","$3","$4;} else {d = $1; for ( i=1;i<length($1);i++ ) if (substr($1,i++,1) == "0") {m=m"0"; else exit 1;}; while(d <= $2) print md++","$3","$4;}}}'
But getting an error:
awk: cmd. line:4: {m=m"0"; else exit 1;}; while(d <= $2) print md++","$3","$4;}}} awk: cmd. line:4: ^ syntax error
Can you please highlight what I’m doing wrong to achieve the expected output. Modification only for my already existing AWK script will be of much help. Thanks
NOTE: The Leading 0s can be of any number of occcurence, not only 2 0s in every case as in the above example outputs.
Advertisement
Answer
since your field sizes are fixed, for the given example just change the last print statement to
$ awk ... printf "%014d,%s,%sn",d++,$3,$4}}}' 00440107490559,0011,240/PLMN/000100 00440214595501,0011,240/PLMN/000100 00440214601626,0011,240/PLMN/000100 00440214601627,0011,240/PLMN/000100 00440214601628,0011,240/PLMN/000100 00440214601629,0011,240/PLMN/000100 00440214601630,0011,240/PLMN/000100 00440214601631,0011,240/PLMN/000100 00440214601632,0011,240/PLMN/000100 00440214601633,0011,240/PLMN/000100 00440214601634,0011,240/PLMN/000100 00440214601635,0011,240/PLMN/000100 00440214601636,0011,240/PLMN/000100 00440245269863,0011,240/PLMN/000100
UPDATE
if your field size is not fixed, you can capture the length (or desired length) and use the same pattern. Since your code is too complicated, I’m going to write a proof of concept which you can embed into your script.
this is essentially your problem, increment a zero padded number and the leading zeros dropped.
$ echo 0001 | awk '{$1++; print $1}' 2
this is the proposed solution with parametric length with zero padding.
$ echo 0001 | awk '{n=length($1); $1++; printf "%0"n"sn", $1}' 0002