Skip to content
Advertisement

Using sed/awk to extract multiple strings from each line

I have a file that contains 30million lines(so big file)

On each line I have this kind of data:

JavaScript

I need to extract both the title value and the rank value so:

JavaScript

Little help? 🙂 I have tried my little heart out and nothing, can only extract one piece of data from each line

Advertisement

Answer

Except in the case there could be escaped quotes between the quotes, and other tricky stuff like that, I would try this sed command to filter your big file:

JavaScript

Basically, you look for two subgroups 1 and 2 containing the fields you want, and you print these separated by a :.

In case the string title appears litterally, the regex passed as argument to sed is less ugly:

JavaScript

Even safer, for avoiding side effects from the random data:

JavaScript
Advertisement