Skip to content
Advertisement

How to download URLs from the website and save them in a file (wget, curl)?

How to use WGET to separate the marked links from this side?

Can this be done with CURL?

I want to download URLs from this page and save them in a file.

I tried like that.

wget -r -p -k https://polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2984/585ddf5a3dde69cb58c7f42ba52790a4

Link Gopher separated the addresses.

enter image description here

EDITION.

How can I download addresses to the file from the terminal?

Can it be done with the help of WGET?

Can it be done with the help of CURL?

I want to download addresses from this page and save them to the file.

I want to save these links.

` https://polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2984/585ddf5a3dde69cb58c7f42ba52790a4

https://polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2985/e15e664718ef6c0dba471d59c4a1928a

https://polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2986/58edb8e0f06dc3da40c255e50b3839cf

` Edition 1.

enter image description here

Advertisement

Answer

You will need to use something like

Download Serialized DOM

I added that to my Firefox browser and it works, although it is a bit slow, and the only time you know it is completed is when the *.html.part file disappears for the corresponding *.html file which you will save using the Add-on button.

Basically, that will save the complete web page (excluding binaries, i.e. images, videos, etc.) as a single text file.

Also, only while saving these files, the developper indicates there is a bug for which you MUST allow “Use in private mode” to circumvent the bug.

Here is a fragment of the full season 44 index page displayed (note the address in the address bar):

enter image description here

Since I don’t have your access I can’t reproduce, but the service is hiding from me the page of the individual video (what you get when you click on a picture) because I don’t have login access. They give me the index instead of the address in the address bar (their security processes at work). However the index page should probably show something different after the “…/sezon-44/5027472/” .

Using that saved DOM file as input, the following will extract the necessary references:

#!/bin/sh

###
### LOGIC FLOW => CONFIRMED VALID
###


DBG=1
#URL="https://polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2984/585ddf5a3dde69cb58c7f42ba52790a4"

###
### Completely expanded and populated DOM file,
### as captured by Firefox extension "Download Serialized DOM"
###
### Extension is slow but apparently very functional.
###


INPUT="test_77_Serialized.html"

BASE=$(basename "$0" ".sh")
TMP="${BASE}.tmp"
HARVESTED="${BASE}.harvest"
DISTILLED="${BASE}.urls"

#if [ ! -s "${TMP}" ]
#then
#   ### Non-serialized
#   wget -O "${TMP}" "${URL}"
#fi

### Each 'more' step is to allow review of outputs to identify patterns which are to be used for the next step.
cp -p ${INPUT} "${TMP}"
test ${DBG} -eq 1 && more ${TMP}



sed 's+<a +n<a +g' "${TMP}" >"${TMP}.2"

URL_BASE=$( grep 'tiba=' ${TMP}.2       |
    sed 's+tiba=+ntiba=+'  |
    grep -v 'viewport'  |
    cut -f1 -d;        |
    cut -f2 -d=        |
    cut -f1 -d% )

echo "n=======================n${URL_BASE}n=======================n"

sed 's+<a +n<a +g' "${TMP}" | grep '<a ' >"${TMP}.2"
test ${DBG} -eq 1 && more ${TMP}.2

grep 'title="Pierwsza Miłość - Odcinek' "${TMP}.2" >"${TMP}.3"
test ${DBG} -eq 1 && more ${TMP}.3

### FORMAT:  Typical entry identified for video files

#<a data-testing="list.item.0" title="Pierwsza Miłość - Odcinek 2984" href="/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2984/585ddf5a3dde69cb58c7f42ba52790a4" class="ifodj3-0 yIiYl"></a><div class="sc-1vdpbg2-2 hKhMfx"><img data-src="https://ipla.pluscdn.pl/p/vm2images/9x/9xzfengehrm1rm8ukf7cvzvypv175iin.jpg" alt="Pierwsza Miłość - Odcinek 2984" class="rebvib-0 hIBTLi" src="https://ipla.pluscdn.pl/p/vm2images/9x/9xzfengehrm1rm8ukf7cvzvypv175iin.jpg"></div><div class="sc-1i4o84g-2 iDuLtn"><div class="orrg5d-0 gBnmbk"><span class="orrg5d-1 AjaSg">Odcinek 2984</span></div></div></div></div><div class="sc-1vdpbg2-1 bBDzBS"><div class="sc-1vdpbg2-0 hWnUTt"><

sed 's+href=+nhref=+' "${TMP}.3"   |
    sed 's+class=+nclass=+'    |
    grep '^href=' >"${TMP}.4"
test ${DBG} -eq 1 && more ${TMP}.4

awk -v base="${URL_BASE}" -v splitter=" '{
    printf("https://%s", base ) ;
    pos=index( $0, "href=" ) ;
    if( pos != 0 ){
        rem=substr( $0, pos+6 ) ;
        n=split( rem, var, splitter) ;
        printf("%sn", var[1] ) ;
    } ;
}' "${TMP}.4" >${TMP}.5
more ${TMP}.5
    
exit

That will give you a report for ${TMP}.5 like this:

https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2984/585ddf5a3dde69cb58c7f42ba52790a4
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2985/e15e664718ef6c0dba471d59c4a1928a
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2986/58edb8e0f06dc3da40c255e50b3839cf
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2987/2ebc2e7b13268e74d90cc64c898530ee
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2988/2031529377d3be27402f61f07c1cd4f4
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2989/eaceb96a0368da10fb64e1383f93f513
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2990/4974094499083a8d67158d51c5df2fcb
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2991/4c79d87656dcafcccd4dfd9349ca7c23
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2992/26b4d8808ef4851640b9a2dfa8499a6d
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2993/930aaa5b2b3d52e2367dd4f533728020
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2994/fa78c186bc9414f844f197fd2d673da3
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2995/c059c7b2b54c3c25996c02992228e46b
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2996/4a016aeed0ee5b7ed5ae1c6117347e6a
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2997/1e3dca41d84471d5d95579afee66c6cf
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2998/440d069159114621939d1627eda37aec
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-2999/f54381d4b61f76bb83f072059c15ea84
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-3000/b272901a616147cd9f570750aa450f99
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-3001/3aca6bd8e81962dc4a45fcc586cdcc7f
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-3002/c6500c6e261bd5d65d0bd3a57cd36288
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-3003/35a13bc5e5570ed223c5a0221a8d13f3
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-3004/a5cfb71ed30e704730b8891323ff7d92
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-3005/d86c1308029d78a6b7090503f8bab88e
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-3006/54bba327bc7a1ae7b9b609e7ee11c07c
https://Polsatboxgo.pl/wideo/seriale/pierwsza-milosc/5027238/sezon-44/5027472/pierwsza-milosc-odcinek-3007/17d199a0523df8430bcb1f21d4a5b573

NOTE: In the image below, the icon between the “folder” and the “star”, in the address bar of that image, is the button for the Download Serialized DOM extension to capture the currently displayed page as a fully-instantiated DOM file.

enter image description here

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement