egrep two identical numbers and two identical letters n times

Question

Hi I've been having trouble with the command egrep. Here is my question: Lets just say I was running in a for loop on these words: I only want to print the word if it has two identical numbers and two identical letters and the word repeats itself. for example in this case the only words that should print are:

Accepted Answer

Try this:$ cat ip.txt99aa88bb99aa88bb 9a9a 11bb11bb 11bb11dd 12aa12aa33aa33bb33aa33bb  aa99aa99 00aa00bb00aa00bb 44aa44aac 2222aaaa2222aaaa 11cc11cc11cc11cc $ grep -owE '(([0-9])2([a-z])3([0-9])4([a-z])5)1+|(([0-9])7([a-z])8)6' ip.txt99aa88bb99aa88bb11bb11bb33aa33bb33aa33bb00aa00bb00aa00bb11cc11cc11cc11ccThis has two cases1) (([0-9])2([a-z])3([0-9])4([a-z])5)1+ which has 8 character construct repeating at least once &#8211; that is the key, using 1* will falsely match 11bb11dd2) (([0-9])7([a-z])8)6 this has 4 character construct repeating exactly onceIf you have them on separate lines, this would dogrep -xE '(([0-9])2([a-z])3([0-9])4([a-z])5)1+|(([0-9])7([a-z])8)6'If 11bb11bb11bb has to be matched as well, use 6+Or, use this very clever suggestion by Nahuel Fouilleul$ grep -owE '((([0-9])3([a-z])4)+)1+' ip.txt99aa88bb99aa88bb11bb11bb33aa33bb33aa33bb00aa00bb00aa00bb11cc11cc11cc11cc(([0-9])3([a-z])4)+ forms the base, 4/8/12/16/etc characters which consists of repeating digit followed by repeating alphabetthat is then captured in outer group and then repeated at least onceNote that if input is large, and you have PCRE -P option, use that instead of -E as backreferences would be much faster, at least in case of GNU grep

Advertisement

Answer