My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
QuickStart  
5-Minute Guide
Updated Mar 31, 2012 by ariya.hi...@gmail.com

Applies to: PhantomJS 1.5.

This instruction assumes that PhantomJS is built and its executable is place somewhere in the PATH.

All of the examples given here are available in the code repository under the sub-directory examples/. For each example, there is two version for each JavaScript and CoffeeScript.

Consult also the API reference.

Hello, world!

Create a new text file that contains the following two lines:

console.log('Hello, world!');
phantom.exit();

Save it as hello.js and the run it:

phantomjs hello.js

The output is:

Hello, world!

In the first line, console.log will print the passed string to the terminal. In the second line, phantom.exit terminates the execution.

It is very important to call phantom.exit at some point in the script, otherwise PhantomJS will not be terminated at all.

Delay

To have an asynchronous (non-blocking) delay, use the usual window.setTimeout or window.setInterval function. This fibo.js example prints the Fibonacci sequence with each new number showing up every 300 ms.

var fibs = [0, 1];
var ticker = window.setInterval(function () {
    console.log(fibs[fibs.length - 1]);
    fibs.push(fibs[fibs.length - 1] + fibs[fibs.length - 2]);
    if (fibs.length > 10) {
        window.clearInterval(ticker);
        phantom.exit();
    }
}, 300);

Because setTimeout/setInterval is non-blocking, do not call phantom.exit right after that. Doing so will cause the script to terminate immediately.

Script Arguments

Using the args array from the System module, the script can obtain the list of command-line arguments.

Consider the following arguments.js example:

var system = require('system');
if (phantom.args.length === 1) {
    console.log('Try to pass some args when invoking this script!');
} else {
    system.args.forEach(function (arg, i) {
            console.log(i + ': ' + arg);
    });
}
phantom.exit();

If it is invoked using the following command:

phantomjs argument.js The quick brown fox

then the output will be:

 0: arguments.js
 1: The
 2: quick
 3: brown
 4: fox

Loading

The script invoked by PhantomJS initially runs on an empty web page. This is not so useful, thus PhantomJS offers the possibility of loading arbitrary URL via phantom.open function on a web page. To encapsulate a web page, instantiate a WebPage object.

A specific URL can be loaded using its open() function. A typical usage is:

var page = require('webpage').create());

page.open(url, function (status) {
  // do something
});

The callback in the open() is executed when the page loading is completed, with status equals to "success" if there is no error and "failed" is error has occurred.

The above construct is a convenient version of the following:

var page = require('webpage').create();

page.onLoadFinished = function (status) {
  // do something
};

page.open(url);

Beside onLoadFinished, there is also onLoadStarted which is invoked when page loading starts for the first time:

var page = require('webpage').create();

page.onLoadStarted = function () {
    console.log('Start loading...');
};

page.onLoadFinished = function (status) {
    console.log('Loading finished.');
};

page.open(url);

The following loadspeed.js script loads a specified URL (do not forget the http protocol) and measures the time it takes to load it.

var page = require('webpage').create(),
    t, address;

if (phantom.args.length === 0) {
    console.log('Usage: loadspeed.js <some URL>');
    phantom.exit();
} else {
    t = Date.now();
    address = phantom.args[0];
    page.open(address, function (status) {
        if (status !== 'success') {
            console.log('FAIL to load the address');
        } else {
            t = Date.now() - t;
            console.log('Loading time ' + t + ' msec');
        }
        phantom.exit();
    });
}

Run the script with the command:

phantomjs loadspeed.js http://www.google.com

It outputs something like:

Loading http://www.google.com
Loading time 719 msec

Page settings

The behavior of the web page can be set via its settings object which can contain properties such as:

  • loadImages defines whether to load inline images or not (default to true)
  • userAgent defines the user agent string passed to the server

For the full list, refer to the complete page settings reference.

The initial values for the settings are from the command-line options specified when invoking the script.

As an example, here is how to change the user agent:

var page = require('webpage').create();

page.settings.userAgent = 'Dragonless Phantom';

page.open(url, function (status) {
  // do something
});

Rendering

A web page can be rasterized to an image or a PDF file using `render()` function.

This rasterize.js is all it takes to capture a web site.

var page = require('webpage').create(),
    address, output, size;

if (phantom.args.length < 2 || phantom.args.length > 3) {
    console.log('Usage: rasterize.js URL filename');
    phantom.exit();
} else {
    address = phantom.args[0];
    output = phantom.args[1];
    page.viewportSize = { width: 600, height: 600 };
    page.open(address, function (status) {
        if (status !== 'success') {
            console.log('Unable to load the address!');
        } else {
            window.setTimeout(function () {
                page.render(output);
                phantom.exit();
            }, 200);
        }
    });
}

An example to produce the rendering of the famous Tiger (from SVG):

phantomjs rasterize.js http://ariya.github.com/svg/tiger.svg tiger.png

which gives the following tiger.png:

Another example: show the polar clock (from RaphaelJS):

phantomjs rasterize.js http://raphaeljs.com/polar-clock.html clock.png

Producing PDF output is also possible, e.g. from a Wikipedia article:

phantomjs rasterize.js 'http://en.wikipedia.org/w/index.php?title=Jakarta&printable=yes' jakarta.pdf

or when creating printer-ready cheat sheet:

phantomjs rasterize.js http://www.nihilogic.dk/labs/webgl_cheat_sheet/WebGL_Cheat_Sheet.htm webgl.pdf

Code Evaluation

To evaluate JavaScript or CoffeeScript code in the context of the web page, use evaluate() function. The execution is sandboxed, there is no way for the code to access any JavaScript objects and variables outside its own page context. An object can be returned from evaluate(), however it is limited to simple objects and can't contain functions or closures.

Here is an example to show the title of a web page:

var page = require('webpage').create();
page.open(url, function (status) {
    var title = page.evaluate(function () {
        return document.title;
    });
    console.log('Page title is ' + title);
});

Any console message from a web page, including from the code inside evaluate(), will not be displayed by default. To override this behavior, use the onConsoleMessage callback. The previous example can be rewritten to:

var page = require('webpage').create();
page.onConsoleMessage = function (msg) {
    console.log('Page title is ' + msg);
};
page.open(url, function (status) {
    page.evaluate(function () {
        console.log(document.title);
    });
});

Canvas

Canvas can be easily constructed and utilized. The following colorwheel.js produces the color wheel.

var page = new WebPage;
page.viewportSize = { width: 400, height : 400 };
page.content = '<html><body><canvas id="surface"></canvas></body></html>';
page.evaluate(function() {
    var el = document.getElementById('surface'),
        context = el.getContext('2d'),
        width = window.innerWidth,
        height = window.innerHeight,
        cx = width / 2,
        cy = height / 2,
        radius = width  / 2.3,
        imageData,
        pixels,
        hue, sat, value,
        i = 0, x, y, rx, ry, d,
        f, g, p, u, v, w, rgb;

    el.width = width;
    el.height = height;
    imageData = context.createImageData(width, height);
    pixels = imageData.data;

    for (y = 0; y < height; y = y + 1) {
        for (x = 0; x < width; x = x + 1, i = i + 4) {
            rx = x - cx;
            ry = y - cy;
            d = rx * rx + ry * ry;
            if (d < radius * radius) {
                hue = 6 * (Math.atan2(ry, rx) + Math.PI) / (2 * Math.PI);
                sat = Math.sqrt(d) / radius;
                g = Math.floor(hue);
                f = hue - g;
                u = 255 * (1 - sat);
                v = 255 * (1 - sat * f);
                w = 255 * (1 - sat * (1 - f));
                pixels[i] = [255, v, u, u, w, 255, 255][g];
                pixels[i + 1] = [w, 255, 255, v, u, u, w][g];
                pixels[i + 2] = [u, u, w, 255, 255, v, u][g];
                pixels[i + 3] = 255;
            }
        }
    }

    context.putImageData(imageData, 0, 0);
    document.body.style.backgroundColor = 'white';
    document.body.style.margin = '0px';
});

page.render('colorwheel.png');
phantom.exit();

DOM Manipulation

Since the script is executed as if it is running on a web browser, standard DOM scripting and CSS selectors work just fine.

The following useragent.js example demonstrates reading the innerText property of the element whose id is myagent:

var page = require('webpage').create();
console.log('The default user agent is ' + page.settings.userAgent);
page.settings.userAgent = 'SpecialAgent';
page.open('http://www.httpuseragent.org', function (status) {
    if (status !== 'success') {
        console.log('Unable to access network');
    } else {
        var ua = page.evaluate(function () {
            return document.getElementById('myagent').innerText;
        });
        console.log(ua);
    }
    phantom.exit();
});

The above example also demonstrates the use of phantom.userAgent to customize the user agent sent to the web server.

Here is another example: finding pizza in Mountain View.

var page = require('webpage').create(),
    url = 'http://lite.yelp.com/search?find_desc=pizza&find_loc=94040&find_submit=Search';

page.open(url, function (status) {
    if (status !== 'success') {
        console.log('Unable to access network');
    } else {
        var results = page.evaluate(function() {
            var list = document.querySelectorAll('span.address'), pizza = [], i;
            for (i = 0; i < list.length; i++) {
                pizza.push(list[i].innerText);
            }
            return pizza;
        });
        console.log(results.join('\n'));
    }
    phantom.exit();
});

An illustration of the use of document.querySelectorAll is given here: show the recent twitter status:

var page = require('webpage').create();

page.onConsoleMessage = function(msg) {
    console.log(msg);
};

page.open(encodeURI("http://mobile.twitter.com/Sencha"), function (status) {
    if (status !== "success") {
        console.log("Unable to access network");
    } else {
        page.evaluate(function() {
            var list = document.querySelectorAll('span.status');
            for (var i = 0; i < list.length; ++i) {
                console.log((i + 1) + ": " + list[i].innerHTML.replace(/<.*?>/g, ''));
            }
        });
    }
    phantom.exit();
});

Network traffic

All the resource requests and responses can be sniffed using the onResourceRequested and onResourceReceived. An example to dump everything is:

var page = require('webpage').create();
page.onResourceRequested = function (request) {
    console.log('Request ' + JSON.stringify(request, undefined, 4));
};
page.onResourceReceived = function (response) {
    console.log('Receive ' + JSON.stringify(response, undefined, 4));
};
page.open(url);

The included examples/netsniff.js shows how to capture and process all the resource requests and responses and export the result in HAR format.

The following shows the waterfall diagram obtained from BBC website:

Comment by noagbodj...@gmail.com, Jan 23, 2011

pure awesomeness!

Comment by cl...@elojasystems.com, Jan 28, 2011

My hell. I have been working on a headless browser with Qt and Webkit for the last 2 months. This is amazing. Thank you.

Comment by peter.du...@gmail.com, Mar 25, 2011

delicious technology!!! hopefully someone will write webkit-lynx ;)

Comment by qbawo...@gmail.com, Apr 2, 2011

is there any way to open one site, click on anchor, be redirected to new url with all cookies etc. ? is there any way to capture/change request http headers ?

Comment by b...@montev.com, Apr 6, 2011

With the rasterize.js example, I am having a lot better luck with png generation than pdf (with any complex page). Can anyone point me in the right direction to why this may be happening?.. or even better possible solutions, but I'll be happy to at least understand why.

Comment by project member ariya.hi...@gmail.com, Apr 9, 2011

PDF depends on Qt PDF engine, which is quite good and but not 100% perfect.

Comment by b...@montev.com, Apr 16, 2011

Ok,I will see what I can do to tweak Qt PDF engine for my needs.

I am actually having a lot of trouble with SVG. I am generating SVG using either protovis or d3 javascript graphing libraries. Very basic examples work, but am having difficulty with all my custom ones. I suppose I can go through the SVG code being generated and figure out what works and what doesn't, I would first like to see if I can do anything with the engine rendering the SVG since everything else I use on my desktop renders the SVG perfectly, as well as all the browsers I have tested. Any recommendations on where I should lok into this problem more?

Powered by Google Project Hosting