What's new? | Help | Directory | Sign in
Google
             
Search
for
Updated Nov 15, 2008 by pilgrim
Labels: is-article, about-security
ArticleScriptInclusion  
HOWTO protect against cross-domain data disclosure attacks

Español日本語Français
HomeWeb Security

Some of the most important problems you encounter on a daily basis relate to unauthorized, cross-domain data disclosure. Any content in your domain may be remotely sourced by the attacker on his page through <SCRIPT SRC=...> tag, with no regard for browser domain restrictions; this request will be made to your servers with victim's browser cookies -- but the response will be parsed by the browser in the security context of attacker's site, possibly disclosing sensitive user data to malicious scripts embedded there.

Browsers will parse and execute a surprising variety of formats through the <SCRIPT SRC=...> tag. The range of possible attack vectors is in no way limited to complete, well-formed Javascript statements. In general, as soon as the attacker is able to use victim's browser to make a successful request to our resource that returns sensitive user information (such as his addressbook or mailbox contents), you have a problem, regardless of how the response is formatted.

Attacks on variable setting and callbacks

Perhaps the simplest data retrieval scheme is to perform a straightforward function callback, or directly set a variable on the calling page. For example, if your script returns this complete statement, a variable assignment:

var my_contacts = { "John Doe", "jdoe@example.com", ... }

...or a series of complete statements, in this case function calls:

registerContact("John Doe", "jdoe@example.com");
registerContact(...);

...then it can be retrieved with XMLHttpRequest and passed directly to eval(), or simply included by appending a new <SCRIPT SRC=...> tag to the document. Although the attacker cannot make XMLHttpRequest to retrieve the data, he is free to use <SCRIPT SRC=...> on his page, and provide his own registerContact() function in this context, or access my_contacts variable and steal sensitive data.

Failed protection: array serialization

Because of the aforementioned attack, you might think you would be safer returning serialized arrays instead. The following payload:

[
  [ "John Doe, "jdoe@example.com" ],
  ...
]

...appears to make sense only if requested by XMLHttpRequest and then passed to eval() in a manner that does not discard the return value. The attacker, hosting his malicious code in an unrelated domain, cannot use XMLHttpRequest to retrieve google.com resources (because of the cross-domain security policies that browsers enforce with such requests). The following will have no desired effect, either, because each SCRIPT block is treated as semantically separate:

<SCRIPT>
  var gimme_data =
</SCRIPT>
<SCRIPT SRC="http://example.com/get_contacts"></SCRIPT>

Unfortunately, the approach is still not safe! Some advanced Javascript features make it possible to read array data during initialization stage, regardless of what happens to the return value. This attack relies on the ability of the attacker to overload the builtin Array object and define value getters and setters for specific data types. It sounds complicated, but the code is surprisingly short:

function Array() {
  var obj = this;
  var ind = 0;
  var getNext = function(x) {
    obj[ind++] setter = getNext;
    if (x) document.write(dump(x));
  };
  this[ind++] setter = getNext;
}

All the attacker has to do is define this script in their own page, then include a script from your page with a standard <SCRIPT SRC=...> statement. If your script returns an array (as above), the attacker will be able to access the contents of the array, even though it is never explicitly stored in a variable or passed to a function.

Possible solution: object serialization

Another type of JSON response is to return serialized objects instead of arrays, like this:

{
  "contact": {
    "name": "John Doe",
    "mail": "jdoe@example.com"
  }
}

This may also apply to simple representations of builtin objects (String, Precision, Integer), for example:

"5eb63bbbe01eeed093cb22bb8f5acdc3"

As of today, there are no publicly known, reliable methods to read back the data in this format in any known browsers. Unlike the Array overloading attack shown above, browsers do not allow attackers to overload Object in the same way. However, you need to be extremely careful with this technique. There are documented cases where subtly different formatting schemes are used -- for example, when an extra (...) appears around the serialized data, or if the object contains nested arrays or callback initializers. FIXME citation?

Other solutions

There are several other approaches that can be employed to verify requests prior to returning any data. These are not mutually exclusive; you may wish to employ all of them at once.

Further reading


Comment by kriszyp, May 14, 2008

You can make JSON non-parsable as JavaScript? by simply prepending it with {}&&. The advantage of this is you don't have to strip anything off to do JSON parsing, simply put it in paranthesis as you normally do with JSON evaluation and it will work fine: {}&&["my","data"] -> parse error. Loaded as a script you get a parse error because the first { is interpreted as a code block ({}&&["my","data"]) -> ["my","data"] When you load it using eval("(" + json +")") it works fine.

Comment by bishillo, Jun 09, 2008

Wops... I started another page on this topic: http://code.google.com/p/doctype/wiki/HowtoProtectJson

The solutions I suggest there are:

  • Only return data on POST requests. All Ajax requests are POST requests, not GET, and requests from script tags and iframes are GET.
  • Check referrer (if present) to check that it's your domain the one requesting the data. If the referrer is not present could be due to programs installed on the client (antispam, anonymizers...) so you should still return data.
  • Don't use the session id received in cookies. Requests from other webpages could contain the cookie, because the browser automatically adds it. If your page requires the session id as a GET parameter, only pages that know the session id will be able to do the right requests.
  • Return json data with the proper content type: application/json (application/x-javascript for some old opera browsers that have a bug). Browsers will prevent the load of the page as HTML, so the content won't be accessible when invoked from iframes.
  • Return the json data encrypted. Only the js on your website will know the right key to decode the data.

Sign in to add a comment