Skip to content
Advertisement

egrep two identical numbers and two identical letters n times

Hi I’ve been having trouble with the command egrep. Here is my question: Lets just say I was running in a for loop on these words:

JavaScript

I only want to print the word if it has two identical numbers and two identical letters and the word repeats itself. for example in this case the only words that should print are:

JavaScript

because each word has at least one or more set of two identical numbers and the two identical letters and then it repeats itself

Here is another example, I am going over these words in a loop:

JavaScript

The only words that should print are

JavaScript

because of whats mentioned above. I am really struggling on how to do this My current command that isnt working is:

JavaScript

The reason why its not working because it prints for me words like:

JavaScript

which are not allowed.

any help would be highly appreciated.

Advertisement

Answer

Try this:

JavaScript

This has two cases

1) (([0-9])2([a-z])3([0-9])4([a-z])5)1+ which has 8 character construct repeating at least once – that is the key, using 1* will falsely match 11bb11dd

2) (([0-9])7([a-z])8)6 this has 4 character construct repeating exactly once


If you have them on separate lines, this would do

JavaScript


If 11bb11bb11bb has to be matched as well, use 6+


Or, use this very clever suggestion by Nahuel Fouilleul

JavaScript
  • (([0-9])3([a-z])4)+ forms the base, 4/8/12/16/etc characters which consists of repeating digit followed by repeating alphabet
  • that is then captured in outer group and then repeated at least once


Note that if input is large, and you have PCRE -P option, use that instead of -E as backreferences would be much faster, at least in case of GNU grep

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement