Using sed/awk to extract multiple strings from each line

Question

I have a file that contains 30million lines(so big file) On each line I have this kind of data: I need to extract both the title value and the rank value so: Little help? 🙂 I have tried my little heart out and nothing, can only extract one piece of data from each line Answer Except in the case there

Accepted Answer

Except in the case there could be escaped quotes between the quotes, and other tricky stuff like that, I would try this sed command to filter your big file:sed 's/^"[^"]*": "([^"]*)".*"(.*)"$/1:2/'Basically, you look for two subgroups 1 and 2 containing the fields you want, and you print these separated by a :.In case the string title appears litterally, the regex passed as argument to sed is less ugly: sed 's/^"title": "([^"]*)".*"(.*)"$/1:2/'Even safer, for avoiding side effects from the random data:sed 's/^"title": "([^"]*)".*"rank": "(.*)"$/1:2/'

Advertisement

Answer