I’m testing the following negated lookbehind assertion and I want to understand the result:
echo "foo foofoo" | grep -Po '(?<!foo)foo'
it prints out
foo foo foo
I was expecting that only the two first foo would be printed, ‘echo foo foofoo’ but not the third one, because my assertion is supposed to mean find ‘foo’ that is not preceded by a ‘foo’.
What am I missing? why is the third foo being matched?
Note: grep -P means interpret the regex as perl compatible regex. grep -o means print out only the matched string. My grep is version 2.5.1.
Advertisement
Answer
After a big discussion on this issue (that has been moved to the chat) I came to the conclusion that my understanding about the lookbehind negative assertion was correct:
echo "foo foofoo" | grep -Po '(?<!foo)foo'
Should return foo two times.
My version of grep, or the PCRE lib that it was compiled with, is buggy.
Some people tested this command on their machines with different versions of grep and they had different results. Some have seen two foo and others had three foo, like me.
I tested that regex with Perl and I had the expected result, foo two times.
grep man page states that -P option is experimental.
My lesson was: if you want PCRE that really works, use Perl.