Headless WebKit with PhantomJS

April 19, 2013 / Mad Coding, PhantomJS

PhantomJS is just pure awesomeness. I gave a presentation on it at Well.ca and talked about how one can use it to facilitate website testing (with CasperJS). At the time we just encountered a bug with the website, so it was fitting. Having a headless WebKit can do so much for you other than just tesitng too!

Taking Screenshots

For another project I was working on, I wanted to capture webpages and save them as images. Before I discovered PhantomJS, I used CutyCapt and Qt graphics-dojo example. PhantomJS is much much simpler to use though. Check out the official rasterize example for details.

Fetching Actual Website Content

Another thing that’s great about having a headless browser is that you can use it to fetch the actual content of a website. Nowadays so many sites are javascript driven so simply fetching the initial HTML won’t do. Below is how you can use PhantomJS to capture the actual content.

var page = require('webpage').create(),
    system = require('system'),
    fs = require('fs'),
    address, output;

if (system.args.length != 3) {
    console.log('Usage: grab.js URL filename');
    phantom.exit(1);
} else {
    address = system.args[1];
    output = system.args[2];

    page.open(address, function (status) {
        if (status !== 'success') {
            console.log('Unable to load the address!');
            phantom.exit();
        } else {
            window.setTimeout(function () {
                var results = page.evaluate(function() {
                    return document.documentElement.innerHTML;
                });

                try {
                    var f = fs.open(output, "w");
                    f.write(results);
                    f.close();
                } catch (e) {
                    console.log(e);
                }
                phantom.exit();
            }, 200);
        }
    });
}

Save the code to grab.js and run it via PhantomJS by providing the URL to fetch and the output file to save the content to.

Danny Su

Headless WebKit with PhantomJS

Taking Screenshots

Fetching Actual Website Content

Pages

Recent Posts

Labels