Skip to content
Advertisement

How to download a website where javascript code lookup results are included? [closed]

How to download a copy of a website in linux?

I have tried using wget --recursive --level=inf https://example.com however it also downloaded links from different domains.

Also is there a way to download a copy of the website where the javascript has run and resulted in output on the page. For example if downloading a weather website, there might be javascript which looks up the current temperature in the database and then renders the output. How to capture the temperature/final output?

Advertisement

Answer

Phantom.js?

http://phantomjs.org/quick-start.html

I think this will do what you like!

The best thing to do is install from here:

http://phantomjs.org/

Basically you run it by creating javascript scripts and passing as a command line arg, e.g.

phantomjs.exe someScript.js

There are loads of examples, you can render a website as an image, for example you can do:

phantomjs.exe github.js

Where github.js looks like

var page = require('webpage').create();
page.open('http://github.com/', function() {
  page.render('github.png');
  phantom.exit();
});

This demo is at http://phantomjs.org/screen-capture.html

You can also show the webpage content as text.

For example, let’s take a simple webpage, demo_page.html:

<html>
    <head>
        <script>
        function setParagraphText() {
            document.getElementById("1").innerHTML = "42 is the answer.";
        }
        </script> 
    </head>
    <body onload="setParagraphText();">
        <p id="1">Static content</p>
    <body>
</html>

And then create a test script, test.js:

var page = require('webpage').create();

page.open("demo_page.html", function(status) {
    console.log("Status: " + status);
    if(status === "success") {
        console.log('Page text' + page.plainText);
        console.log('All done');        
    }
phantom.exit();
});

Then in the console write:

> phantomjs.exe test.js
Status: success
Page text: 42 is the answer.
All done

You can also inspect the page DOM and even update it:

var page = require('webpage').create();

page.open("demo_page.html", function(status) {
    console.log("Status: " + status);
    if(status === "success") {
        page.evaluate(function(){
            document.getElementById("1").innerHTML = "I updated the value myself";
        });

        console.log('Page text: ' + page.plainText);
        console.log('All done');
    }
    phantom.exit();
});
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement