I have several strings(or filenames in a directory) and i need to group them by second most common pattern, then i will iterate over them by each group and process them. in the example below i need 2 from ACCEPT and 2 from BASIC_REGIS, bascially from string beginning to one character after hyphen (-) and it could be any character and not just digit. The first most common pattern are ACCEPT and BASIC_REGIS. I am looking for second most common pattern using grep -Po (Perl and only-matching). AWK solution is working
INPUT
ACCEPT-zABC-0123 ACCEPT-zBAC-0231 ACCEPT-1ABC-0120 ACCEPT-1CBA-0321 BASIC_REGIS-2ABC-9043 BASIC_REGIS-2CBA-8132 BASIC_REGIS-PCCA-6532 BASIC_REGIS-PBBC-3023
OUTPUT
ACCEPT-z ACCEPT-1 BASIC_REGIS-2 BASIC_REGIS-P
echo "ACCEPT-0ABC-0123"|grep -Po "K^A.*-"
Result : ACCEPT-0ABC-
but I need : ACCEPT-0
However awk solution is working
echo "ACCEPT-1ABC-0120"|awk '$0 ~ /^A/{print substr($0,1,index($0,"-")+1)}'
ACCEPT-1
Advertisement
Answer
You don’t need -P
(PCRE) for that, just a plain, old BRE:
$ grep -o '^[^-]*-.' file | sort -u ACCEPT-0 ACCEPT-1 BASIC_REGIS-2 BASIC_REGIS-9
Or using GNU awk alone:
$ awk 'match($0,/^[^-]*-./,a) && !seen[a[0]]++{print a[0]}' file ACCEPT-0 ACCEPT-1 BASIC_REGIS-2 BASIC_REGIS-9
or any awk:
$ awk '!match($0,/^[^-]*-./){next} {$0=substr($0,1,RLENGTH)} !seen[$0]++' file ACCEPT-0 ACCEPT-1 BASIC_REGIS-2 BASIC_REGIS-9