linux extract portion of the string that can be second most common pattern

Question

I have several strings(or filenames in a directory) and i need to group them by second most common pattern, then i will iterate over them by each group and process them. in the example below i need 2 from ACCEPT and 2 from BASIC_REGIS, bascially from string beginning to one character after hyphen (-) and it could be any character

Accepted Answer

You don&#8217;t need -P (PCRE) for that, just a plain, old BRE:$ grep -o '^[^-]*-.' file | sort -uACCEPT-0ACCEPT-1BASIC_REGIS-2BASIC_REGIS-9Or using GNU awk alone:$ awk 'match($0,/^[^-]*-./,a) && !seen[a[0]]++{print a[0]}' fileACCEPT-0ACCEPT-1BASIC_REGIS-2BASIC_REGIS-9or any awk:$ awk '!match($0,/^[^-]*-./){next} {$0=substr($0,1,RLENGTH)} !seen[$0]++' fileACCEPT-0ACCEPT-1BASIC_REGIS-2BASIC_REGIS-9

linux extract portion of the string that can be second most common pattern

INPUT

OUTPUT

Advertisement

Answer