Skip to content
Advertisement

How to find the lines that include same two letters by using grep?

For example "Conclusion" has two c but at different index. I am using

egrep -i 'c{2}' input.file

It shows me only like “Accept”. I mean it shows me only these words are with side by side but i want to see here also “Conclusion”

Advertisement

Answer

If you plan to match lines that contain any two identical letters that are not necessarily consecutive, you can use

grep -i '([[:alpha:]]).*1'

Here, ([[:alpha:]]) is a capturing group with ID 1 that matches any letter, .* matches any text and then 1 backreference matches the same char as in Group 1 (case insensitively due to -i option).

If you have in mind a specific letter of your choice, you can use a simpler

grep -i 'c.*c'

So, just replace the POSIX character class and backreference with the letter.

If you want to make sure there are ONLY two identical letters and not more, you can use

grep -iP '^(?!.*(p{L})(?:.*1){2}).*(p{L}).*2'

Details of the PCRE pattern (note the P option):

  • ^ – start of string
  • (?!.*(p{L})(?:.*1){2}) – a negative lookahead that fails the match if there are zero or more chars other than line break chars, as many as possible, then any Unicode letter (captured into Group 1, and then two occurrences of any zero or more chars other than line break chars, as many as possible, followed with the same letter as captured into Group 1
  • .* – zero or more chars other than line break chars, as many as possible
  • (p{L}) – Group 2: any Unicode letter
  • .* – zero or more chars other than line break chars, as many as possible
  • 2 – Backreference to Group 2 value.

And if the character is a specific one:

grep -i '^[^c]*c[^c]*c[^c]*$' <<< "$s"

where [^c]* matches zero or more chars other than c.

See the grep demo:

#!/bin/bash
s='Conclusion
cold
Collaccio'
grep -i '([[:alpha:]]).*1' <<< "$s"
# => Conclusion
#    Collaccio
grep -iP '^(?!.*(p{L})(?:.*1){2}).*(p{L}).*2' <<< "$s"
# => Conclusion
grep -i 'c.*c' <<< "$s"
# => Conclusion
#    Collaccio
grep -i '^[^c]*c[^c]*c[^c]*$' <<< "$s"
# => Conclusion
Advertisement