Skip to content
Advertisement

Extract text between two known strings in a web page and store in a variable

The web page contains this line:

var zx_fn = “string with any possible character”;

I download the web page, then I try to to take the part between quotes and store it in a variable

my code:

#!/bin/sh
url="http://www.example.com/..."
content=$(wget -q -O - $url)
var1=$(sed -n '/^var zx_fn = "$/,/^";$/p' "$content")
echo $var1

It doesn’t work because it says:

sed: can’t read

And it returns the whole page content

Also what’s better for this case? grep, awk or sed?

This question has been marked as a duplicate but the other one doesn’t clear my doubts, as i need help both with the variable storage and with the regex.

If I follow that answer, the code returns:

Syntax error: redirection unexpected

Advertisement

Answer

$ foo='var zx_fn = "string with any possible character";'
$ bar=$(sed -n 's/var zx_fn = "([^"]*)";$/1/p' <<< "$foo")
$ echo "$bar"
string with any possible character

“any possible character” above is assumed to mean “… except double quote”. If it can include double quotes then let us know how they are escaped within those strings so we can tell you how to handle them.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement