Skip to content

How do i extract some particular words from each line?

The text file has many lines of these sort , i want to extract the words after /videos till .mp4 and the very last number ( shown in bold ) and output each filtered line in a separate file**S4KWZTyt-32313922.mp4**.m3u8?hdnts=exp=1592315851~acl=*/S4KWZTyt-32313922.mp4.m3u8~hmac=83f4674e6bf2576b070c716a3196cb6a30f35737827ee69c8cf7e0c57a196e51 **1** 

Lets say for example the text file content is ..*/JajSfbVN-32313922.mp4.m3u8~hmac=d3ca7bd5b233a531cfe242d17d2ea0c0167b41b90fff6459e433700ffc969d69 19*/Qs3xZqcv-32313922.mp4.m3u8~hmac=c30e2082bf748a6b4d1621c1d33a95319baa61798775e9da8856041951cf5233 20

The output should be

JajSfbVN-32313922.mp4 19
Qs3xZqcv-32313922.mp4 20



You may try the below regex:

.*/videos/(.*?mp4).*?(?<= )(d+)

Explanation of the above regex:

.* – Matching everything before videos.

/videos/ – Matching videos literally.

(.*?mp4) – Represents a capturing group lazily matching everything before mp4.

.*? – Greedily matches everything before the occurrence of digits.

(d+) – Represents second capturing group matching the numbers at the end as required by you.

You can find the demo of the above regex in here.

Pictorial representation

Command line implementation in linux:

cat regea.txt | perl -ne 'print "$1 $2n" while /.*/videos/(.*?mp4).*?(?<= )(d+)/g;'> out.txt

You can find the sample implementation of the above command in here.

User contributions licensed under: CC BY-SA
5 People found this is helpful