Skip to content
Advertisement

How to grep multi line string with new line characters or tab characters or spaces

My test file has text like:

> cat test.txt
new dummy("test1", random1).foo("bar1");
new dummy("
        test2", random2);
new dummy("test3", random3).foo("bar3");
new dummy = dummy(
            "test4", random4).foo("bar4");

I am trying to match all single lines ending with semicolon (;) and having text “dummy(“. Then I need to extract the string present in the double quotes inside dummy. I have come up with the following command, but it matches only the first and third statement.

> perl -ne 'print if /dummy/ .. /;/' test.txt | grep -oP 'dummy((.|n)*,'
dummy("test1",
dummy("test3",

With -o flag I expected to extract string between the double quotes inside dummy. But that is also not working. Can you please give me an idea on how to proceed?

Expected output is:

test1
test2
test3
test4

Some of the below answers work for basic file structures. If lines contains more than 1 new line characters, then code breaks. e.g. Input text files with more new line characters:

new dummy("test1", random1).foo("bar1");
new dummy("
        test2", random2);
new dummy("test3", random3).foo("bar3");
new dummy = dummy(
            "test4", random4).foo("bar4");
new dummy("test5",
        random5).foo("bar5");
new dummy("test6", random6).foo(
        "bar6");
new dummy("test7", random7).foo("
        bar7");

I referred to following SO links:

How to give a pattern for new line in grep?

how to grep multiple lines until ; (semicolon)

Advertisement

Answer

@TLP was pretty close:

perl -0777 -nE 'say for map {s/^s+|s+$//gr} /bdummy(s*"(.+?)"/gs' test.txt
test1
test2

Using

  • -0777 to slurp the file in as a single string
  • /bdummy(s*"(.+?)"/gs finds all the quoted string content after “dummy(” (with optional whitespace before the opening quote)
    • the s flag allows . to match newlines.
    • any string containing escaped double quotes will break this regex
  • map {s/^s+|s+$//gr} trims leading/trailing whitespace from each string.
User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement