I extract all the links from a specific webpage using lynx. It give the following output: I want to do following things. Output only those links which contains /video/ remove the title in the end of the link http://www.example.com/video/1001/The-title-of-video should output only http://www.example.com/video/1001/ Answer Use grep to filter the output, sed to remove the title:

Removing text after a specific delimeter

I extract all the links from a specific webpage using lynx.

lynx -dump http://www.example.com/videos | awk '/http/"{print $2}"' >> links.txt

JavaScript
​x
 
lynx -dump http://www.example.com/videos | awk '/http/"{print $2}"' >> links.txt​

It give the following output:

http://www.example.com/home/
http://www.example.com/contact/
http://www.example.com/videos/
..
..
..
..
http://www.example.com/video/1001/The-title-of-video
http://www.example.com/video/1002/The-title-of-video
http://www.example.com/video/1003/The-title-of-video
http://www.example.com/video/1004/The-title-of-video
..so on

JavaScript
 
http://www.example.com/home/http://www.example.com/contact/http://www.example.com/videos/........http://www.example.com/video/1001/The-title-of-videohttp://www.example.com/video/1002/The-title-of-videohttp://www.example.com/video/1003/The-title-of-videohttp://www.example.com/video/1004/The-title-of-video..so on​

I want to do following things.

Output only those links which contains /video/
remove the title in the end of the link http://www.example.com/video/1001/~~The-title-of-video~~ should output only http://www.example.com/video/1001/

Answer

Use grep to filter the output, sed to remove the title:

lynx -dump http://www.example.com/videos | grep /video/ | sed 's=/[^/]*$=='

JavaScript
 
lynx -dump http://www.example.com/videos | grep /video/ | sed 's=/[^/]*$=='​

Advertisement

Answer