Skip to content
Advertisement

Regex Pattern matching refinement

I have a json that is returned to a variable, I’m trying to only grab values of from the json. I’m only limited to grep, sed, and awk

RESULTS='{ "results" : [ { "repo" : "appdeploy", "path" : "org/test/cxp/python/1.0-SNAPSHOT", "name" : "python-1.0-20170519.130808-42.jar" } ], "range" : { "start_pos" : 0, "end_pos" : 1, "total" : 1 } }'
echo $RESULTS | grep -o '"path" : "(.*)",'

returns me the result

"path" : "org/test/cxp/python/1.0-SNAPSHOT",

and honestly the only part I want is

org/test/cxp/python/1.0-SNAPSHOT

Advertisement

Answer

With jq, you could use '.results[0] | .path' filter. You may play around with this tool online here.

However, if you have no access to jq, you may use a PCRE based grep command like

grep -oP '(?<="path" : ")[^"]+'

The -P option enables the PCRE regex syntax usage where you may use lookarounds that only check for the pattern match, but do not include the matched text into the returned match value.

Pattern details

  • (?<="path" : ") – a positive lookbehind that matches a position that is preceded with "path" : " substring
  • [^"]+ – a negated bracket expression that matches and consumes (adds to the match value) 1 or more chars other than ".

See the online grep demo:

RESULTS='{ "results" : [ { "repo" : "appdeploy", "path" : "org/test/cxp/python/1.0-SNAPSHOT", "name" : "python-1.0-20170519.130808-42.jar" } ], "range" : { "start_pos" : 0, "end_pos" : 1, "total" : 1 } }'
echo $RESULTS | grep -oP '(?<="path" : ")[^"]+'

Printing org/test/cxp/python/1.0-SNAPSHOT.

Advertisement