Skip to content
Advertisement

How to download URLs from the website and save them in a file (wget, curl)?

How to use WGET to separate the marked links from this side?

Can this be done with CURL?

I want to download URLs from this page and save them in a file.

I tried like that.

wget -r -p -k https://polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2984/585ddf5a3dde69cb58c7f42ba52790a4

Link Gopher separated the addresses.

enter image description here

EDITION.

How can I download addresses to the file from the terminal?

Can it be done with the help of WGET?

Can it be done with the help of CURL?

I want to download addresses from this page and save them to the file.

I want to save these links.

` https://polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2984/585ddf5a3dde69cb58c7f42ba52790a4

https://polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2985/e15e664718ef6c0dba471d59c4a1928a

https://polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2986/58edb8e0f06dc3da40c255e50b3839cf

` Edition 1.

enter image description here

Advertisement

Answer

You will need to use something like

Download Serialized DOM

I added that to my Firefox browser and it works, although it is a bit slow, and the only time you know it is completed is when the *.html.part file disappears for the corresponding *.html file which you will save using the Add-on button.

Basically, that will save the complete web page (excluding binaries, i.e. images, videos, etc.) as a single text file.

Also, only while saving these files, the developper indicates there is a bug for which you MUST allow “Use in private mode” to circumvent the bug.

Here is a fragment of the full season 44 index page displayed (note the address in the address bar):

enter image description here

Since I don’t have your access I can’t reproduce, but the service is hiding from me the page of the individual video (what you get when you click on a picture) because I don’t have login access. They give me the index instead of the address in the address bar (their security processes at work). However the index page should probably show something different after the “…/sezon-44/5027472/” .

Using that saved DOM file as input, the following will extract the necessary references:

JavaScript

That will give you a report for ${TMP}.5 like this:

JavaScript

NOTE: In the image below, the icon between the “folder” and the “star”, in the address bar of that image, is the button for the Download Serialized DOM extension to capture the currently displayed page as a fully-instantiated DOM file.

enter image description here

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement