How do I extract URL’s from a file?
My file name is URL_name.txt
This file has a lot of url inside.
It looks like this:
<pre> <pre><div></pre><something>something here<href="http://www.google.com/">something here</font> <font><href="http://www.stackoverflow.com/">something</td> .. .. .. </pre>
Here is my idea, I want to remove everything before URLs then I can remove everything after URL. How do I use sed command deal with it? The output should be
http://www.google.com/
http://www.stackoverflow.com/
Advertisement
Answer
You can use grep
:
grep -o 'http://[^"]*' yourfile