Skip to content
Advertisement

How can you run an html file with its javascript content on Linux terminal?

I am working on a website crawler bot which extracts a specific information from them. And I need to run at least “on document ready” javascript function on an html file, so that the content is generated and I can get it. How can I do this? I saw about a command called “rhino” but it seems it is only for .js files, the file is an html file. It includes both html and JS inside, as you can guess. The plan is: Download html files, edit their “on document ready” js functions, get output, pass on the next one, repeat.

Advertisement

Answer

You can try some manager for a headless browser.

This is an example of how something similar can be done with GoogleChrome/puppeteer. If this does not work for you, please elaborate your task and issues.

'use strict';

const puppeteer = require('puppeteer');

(async function main() {
  try {
    const browser = await puppeteer.launch();
    const [page] = await browser.pages();

    await page.goto('https://example.org/', { waitUntil: 'domcontentloaded ' });

    const data = await page.evaluate(() => {
      return document.title;
    });

    console.log(data);

    await browser.close();
  } catch (err) {
    console.error(err);
  }
})();
User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement