My favorites | Sign in
Logo
                
Search
for
Updated Jun 19, 2008 by mikesamuel
Labels: Phase-Deploy
CajaWhitelists  
Schema for whitelists used by the Cajoler

Whitelist File Format and Schema

Background

Caja uses white-lists to approve HTML tags, HTML attributes, and CSS properties. These white-lists were hard-coded in java files.

The white-list is a table of white-listed item. Each row in the table includes a key (variously an HTML element name, HTML attribute name, or CSS property name), and some information about the item's content.

Although we believe the default white-lists are fairly comprehensive, clients that do their own preprocessing of HTML & CSS before cajoling may want to white-list constructs which they know are safe, but which Caja cannot prove safe for arbitrary input.

It may become necessary to support different element definitions for HTML4 vs XHTML.

Builtin WhiteLists

WhiteList File Format Overview

A white-list is a JSON file like:

{
  "description":
      "Extends the default HTML tag WhiteList to allow <OBJECT> and deny <IFRAME>",

  "inherits": [
    { "src": "resource:///com/google/caja/html/tags.json" }
  ],

  "allowed": [
    { "key": "OBJECT" }
  ],

  "denied": [
    { "key": "IFRAME" }
  ],

  "types": [
    { "key": "OBJECT", "optionalEndTag": false, "empty": false, "breaksFlow": false }
  ]
}

We use JSON, because, unlike XML, parsing it cannot open arbitrary network connections, and it is more extensible since it doesn't make a hard distinction between elements, which can be extended, and attributes, which can't.

Elements of a WhiteList

The example above includes four elements, inherits, allowed, denied, and types.

The cajoler examines the inherits and loads those. The inherits src can be either a file://... URL, or a special {{{resource://...} URL which is resolved relative to the Cajoler's class-path.

A WhiteList has the form

    interface WhiteList {{{
      Set<String> allowedItems();
      Map<String, TypeDefinition> typeDefinitions();

      interface TypeDefinition {
        Object get(String key);
      }
    }

The cajoler loads a white-list using the following algorithm:

This algorithm preserves the properties that

This allows white-lists to be used in several distinct ways:

Type information in WhiteLists

For HTML Elements, we need the following type information:

For HTML Attributes, we need the following type information:

For CSS Properties, we need the following type information:

Specifying WhiteLists at the Command Line

From usage:

 --css_prop_schema         A file: or resource: URI
                           of the CSS Property Whitelist to use.
 --html_attrib_schema      A file: or resource: URI
                           of the HTML attribute Whitelist to use.
 --html_property_schema    A file: or resource: URI
                           of the HTML element Whitelist to use.

Sign in to add a comment