get multiple words after a specific word of HTML using linux/unix scripting

Question

i have a file 'movie.html' : I want to get multiple word with pipe delimited like this: I tried this code: but the output isn't as my expectation please help me, i am still a beginner Answer Parsing html with regex is not advised for several reasons (see https://stackoverflow.com/a/1732454/12957340), but here is one potential solution:

Accepted Answer

Parsing html with regex is not advised for several reasons (see https://stackoverflow.com/a/1732454/12957340), but here is one potential solution:awk -F'[<>/"]' 'BEGIN{ print "title | link" }; /(.*)/ {print $6 " | " $3}' movie.html

Advertisement

Answer