What's new? | Help | Directory | Sign in
Google
             
Search
for
Updated Nov 15, 2008 by pilgrim
Labels: is-article, about-security
ArticleE4XSecurity  
HOWTO protect against E4X markup injection

Español日本語Français
HomeWeb Security

The following set of data inclusion problems may affect your application even if you do not explicitly serve dynamic Javascript. You need to be aware of this attack if you wrap inline script blocks in HTML, as well as other payloads that may contain script snippets. In short, because of a somewhat bizarre bid to make Javascript support XML data structures "natively," a resource does not have to serve pure Javascript to be vulnerable to <SCRIPT SRC=...> inclusion. A standard called E4X (ECMAScript for XML), was originally designed to permit this fairly harmless notation:

  var x = <contact><name>John Doe</name><mail>jdoe@example.com</mail></contact>;
  alert(x);

...and nesting of initializers in this style:

  alert(<name>{ get_name(); }</name><mail>none</mail>);

The standard is currently supported by default by several browsers, including Firefox.

Attacks on embedded scripts

The most important side consequence of E4X is that any standalone, well-formed XML markup is treated as a variable with an ignored value. In other words, the following may be included as a script:

<contact><name>John Doe</name><mail>jdoe@example.com</mail></contact>

...and will be parsed to an object along the lines of:

{ "contact": {
    "name": "John Doe"
    "mail": "jdoe@example.com"
  }
}

Unfortunately, the very same XML syntax is used in any well-formed HTML document; the following can be included through <SCRIPT SRC=...> and will be parsed into a data structure likewise -- then promptly discarded, because the resulting value is not assigned to anything, effectively forming a no-op statement:

<html>
<title>John Doe's mailbox</title>
<script>alert('Hello world');</script>
<body>
  ...
</body>
</html>

So far, so good, and no harm done. Where it gets hairy are embedded Javascript initializers - as noted before, any block inside {...} will be executed and used to initialize the data structure, even if preceded with random gibberish that does not form valid Javascript and would never clear the builtin interpreter. Consider the following example:

<html>
<script>
  function dostuff() {
    username = 'SERVER_INSERTED_STRING' ...
  }
</script>
<body>
  ...
</body>
</html>

If this page is included across domains with a <SCRIPT SRC=...> tag, the expression inside {...} will get executed in the context of the attacker's page. A mitigating factor is that in Firefox, the initializer is limited to a single function call or statement. This is not a great consolation, however, as there is a set of valid and likely Javascript code structures that would also serve as valid E4X initializers.

And here's why it gets more complicated...

Attacks on plain old
<nop>
HTML documents

Unfortunately, the trouble with E4X does not end there; because of how nested initializers work, the following may have the same interesting effect:

<html>
<body>
  Non-Javascript text
  Something completely non-parseable - 1 2 3 **** }}
  ...
  { x =                       <- attacker-supplied
    ...
    User mailbox data
    in HTML format
    ...
  }                           <- static or attacker-supplied
</body>
</html>

In effect, if the attacker is able to render non-escaped { character at some point on the page, in front of user-sensitive parameters (be it in a parameter copied over from URL data, a subject line from an e-mail), and can place or use a trailing } somewhere later on, the entire block well-formed HTML between these locations may be disclosed, in neatly parsed format, to the attacker.

Alternatively, if the content between { and } is not a well-formed block of XML, but contains no line breaks and double quotes, injecting { x &#061; " and " } might be used to achieve the same.

Preventing E4X markup injection

Preventing E4X attacks is hard, and may not be able to be fully implemented on all pages because of usability concerns. However, there are several lines of defense you can use for high profile targets.

  • Use non-predictable URLs with authentication tokens. This is similar to the recommendation made for other script inclusion attacks.
  • Serve pages that are not valid HTML. This is not a joke. E4X XML parsers are strict, and bail out immediately on a single mismatching closing tag. If HTML validation is not an immediate concern for a page, placing <x></y> near the end of a document is believed to be a sufficient defense against E4X inclusion. If plain old HTML is used, unmatched <br> tags often suffice to break parsing.
  • Serve XHTML with the optional <?xml ...> prolog. This prolog, being not a valid tag name, seems to break this Firefox parser.
  • Careful construction. Be sure to use multi-statement scripts that cannot mimick E4X initializers. Escape { in user-supplied data, and construct your pages never to enclose single-line statements or user-sensitive, well-formed markup with attacker-controlled data.

Further reading


Comment by alexkon, Oct 16, 2008

Could you please provide references to the sources of this information? Are there any papers (blog posts, presentations) on this?

Which browser versions are affected? Are browser developers aware of the possibility of this attack? Do they consider it a vulnerability in their browsers? Does it have a CVE number?

It looks like adding the <?xml ...> prolog won't protect from this attack once bug 336551 in the Mozilla JavaScript engine is fixed.

Alternatively, Mark, is it possible to contact the authors of this article directly?

Comment by jesset, Oct 28, 2008

Damn, this highlights less for me the "danger" e4x syntax creates, and significantly more for me the flimsiness with which the web is protected from cross site scripting. Since <script src=..> can pull from another domain, and since that will never change as it's the major delivery method for web based advertisements, we're always walking on eggshells.

If I had anything to say about it I would demand a new standard where browsers would treat <script src=..> with the same cross-site security as everything else, and then allow /any/ cross-domain request (be it <script src=..> or xmlhttprequest, etc) through the firewall if the embedded data's HTTP header includes a "Cross-domain: ok" header (or optionally "Cross-domain: /regex/" to match domains allowed to pull data from there.)

Where would you recommend I could take such a proposition? Is there a newsgroup somewhere? XD

- - Jesse


Sign in to add a comment