|
UrlPolicy
specifies which URLs untrusted code can fetch, and in what contexts.
URL PoliciesGoalsAllow server side rewriter to inline content, rewrite URLs, and the client-side runtime JS to rewrite URLs used in HTML attributes and passed to DOM APIs. Many containers want to pass URLs through proxies that strip cookies, and verify, rewrite, or re-encode content. These proxies will also check that the advertised mime-type matches the kind requested, so that if a URL appears in an image's src attribute, and is a JS/GIF polyglot, it must have an image mime-type. Since the URL Policy lies at the border between Caja and the container, errors in it can compromise both. It is a goal for policies to be conservatively backwards compatible -- if a policy denies a URL in a certain context, then changes to Caja or to the URL policy interface should not make existing policies more permissive. BackgroundHow can a URL policy implementation distinguish between URLs that are loaded without further user action, and distinguish the expected type of content?URLs, URIs, and URNs appear in HTML attributes as parts of CSS property values, and as arguments to Javascript APIs.
The above can be broken down into a few broad source categories that describe when the content is fetched, and into what security context it is loaded:
And there are a few broad types of content:
Likely PoliciesMany clients will want to proxy external content to enforce well-formedness, require that the advertised mime-type and encoding match the actual mime-type and encoding, filter out images that exploit known buffer overflows, strip cookies to prevent XSRF, etc. Clients who proxy content will likely want to whitelist specific URLs or hosts, such as an image hosting service that they trust not to serve problematic content. Clients may want to substitute their own version of a piece of common content for certain URLs to improve consistency or caching, so to serve their own version of the jQuery library in place of any URL whose path ends with /jquery.js. Some clients may want to ban all dynamically loaded scripts and styles, and others may want to pass dynamically loaded scripts and styles through a rewriting proxy. Some clients may wish to prevent "phoning home," prevent user data from leaking by denying access to any but a whitelisted set of hosts. URLs and regular expressionsShould the URL policy receive URLs as strings or as objects?Javascript has no builtin APIs for composing, decomposing, resolving, or manipulating URIs. It provides a few functions for encoding and decoding URL parts, but the decoding parts are problematic since the decoding of '+' differs depending on where it appears. Most JS code that deals with URLs uses regular expressions in ways that are subtly or blatantly incorrect. URL References vs URLsShould the URL policy receive URL fragments?RFC 3986 uses the term URI to refer to identifiers without a "fragment" such as scheme://authority/path?query and the term URI reference to refer to identifiers with a framgnet such as scheme://authority/path?query#fragment. The HTML5 spec uses the term URL to refer to both. Under HTTP, servers never receive the fragment from the browser. Non-latin characters and case folding in international domain namesIs domain name normalization the responsibility of the url policy caller or the url policy implementation?How many non-malicious gadgets would break if a hostname whitelisting url policy rejected URLs with unnormalized domains?Erik van der Poel, a unicode.org contributor, says: The browsers implement a set of RFCs called IDNA (Internationalizing Domain Names in Applications) specified in RFCs 349{0,1,2} and 3454. The IDNA process includes a Nameprep step (based on Stringprep) that involves lower-casing, case-folding, mapping to nothing (deleting) and NFKC normalization. This step is often bundled into the same API that performs the final Punycode step (xn-- followed by gibberish). These steps can fail, in which case you probably want to reject that domain name. If you're using Java, ICU4J has a class called IDNA. The browsers handle illegal Punycode names differently. MSIE7 rejects them, while Firefox allows them. Non-Latin characters are covered by IDNA too. One example is the soft hyphen (U+00AD), which is "mapped to nothing" in IDNA. I have come across URLs on the Web where there is a soft hyphen at a hyphenation point, e.g. micro<U+00AD>soft.com. MSIE7 just goes to microsoft.com, but MSIE6 goes to micro\xC2\xADsoft.com, so if your whitelist contains hosts at unscrupulous registries like *.cc and the like, your MSIE6 users might accidentally go to a site that you didn't intend to whitelist. Legacy URL PoliciesWhere are URL policies evaluated?Will old URL policies continue to work?There are currently few URL policies in production. Those are based around two different APIs: one java interface, and a separate JS one. /**
* Specifies how the plugin resolves external resources such as scripts and
* stylesheets.
*
* @author mikesamuel@gmail.com
*/
public interface PluginEnvironment {
/**
* Loads an externally resource such as the src of a script tags or
* a stylesheet.
*
* @return null if it could not be loaded.
*/
CharProducer loadExternalResource(ExternalReference ref, String mimeType);
/**
* May be overridden to apply a URI policy and return a URI that enforces that
* policy.
*
* @return null if the URI cannot be made safe.
*/
String rewriteUri(ExternalReference uri, String mimeType);
}and @param {Object} uriCallback an object like {
rewrite: function (uri, mimeType) { return safeUri }
}.
The rewrite function should be idempotent to allow rewritten HTML
to be reinjected.DesignDecisionsHow can a URL policy implementation distinguish between URLs that are loaded without further user action, and distinguish the expected type of content?Our API will expose the distinctions described above as hints: URNs vs. immediate load in same document vs. eventual load in new document, document level content vs. side effecting includes, audiovisual content, data. Should the URL policy receive URL fragments?No compelling reason to deny that data. There has been at least one attack that exploited data in fragments. Should the URL policy receive URLs as strings or as objects?Is domain name normalization the responsibility of the url policy caller or the url policy implementation?This problem can be solved with library support, but because of IDNA and encoding issues, all url policy callers will have to normalize the input URL anyway. How many non-malicious gadgets would break if a hostname whitelisting url policy rejected URLs with unnormalized domains?I believe the number is small. There are widely used international domain names and IDNA is hard to implement in javascript since it requires full NFKC normalization. But we can normalize all URLs that appear in HTML and CSS server-side, and most other URLs generated by JS are derived from those URLs. How do we make sure that rewritten URLs in innerHTML can be extracted and reinjected?We can either require that URL policies be idempotent in the (∀ x∈dom(f), f(x)=f(f(x))) sense, or that URL rewriting be reversible. Idempotence is simple to test for and is easier to implement, so we require URL policies to be idempotent. Should the URL Policy design make specific allowances for memoization?No. The URL policy API should be designed in such a way that a generic memoizing implementation can wrap a non-memoizing implementation if memoization turns out to be a significant concern. That should be possible if the URL policy takes immutable inputs, produces immutable results, and is stateless, this should be possible. Where are URL policies evaluated?Callers need to invoke the URL policy from both server-side java, and from client-side javascript. There are two ways that these can be unified - either the policies are authored in Java and the client javascript is generated from the java class. Alternatively, we can use Rhino or another server-side JS interpreter to interpret a policy implemented in JS in the cajoler. The latter is preferred since a cajoling service that cajoles output for a variety of containers may need to supply different policies to the service. The policy might not be entirely trusted code and may need to itself be cajoled before it is run or it may needed to be sandboxed in some other way. Is it the responsibility of the policy or the caller to resolve relative URLs?The caller. Only the caller has enough information to resolve URLs correctly. E.g., the HTML rewriter will want to resolve URLs relative to the input HTML's source or <base href>, pass the URL to the policy, and then relativize the URL against the URL the gadget will be served under. Will old URL policies continue to work?Maybe, but we should work with the few existing policy authors to rework them quickly. What is responsible for fetching scripts and styles so the Cajoler can inline them?The URL policy will no longer be responsible for this. It doesn't need to be involved in URL fetching in the browser, so we will separate out URL loading into a separate concern: a java interface UrlGetter that can GET content from a URL that appears statically in HTML or CSS. DefinitionA URL policy is a mapping from absolute normalized† URI references plus context hints to URI references or the special DENY value. URL policies are implemented as Javascript (or Cajita) objects with the API below. Context hints come in several flavors:
† - URL normalization on the Cajoler is reliable, but is best-effort on the client. APIThe API is a javascript object which has a single public method which dispatches to other methods. If an implementor neglects to implement one of these, e.g. rewriteScriptUrl, then they cannot suffer compromises due to a failure to properly proxy dynamically loaded script sources. If none of the specific handlers are applicable, then it tries rewriteOther which can be used to log the fact that a URL could not be rewritten, or try some best-effort but extra-paranoid proxying. {
rewriteUrl: function (absUrl, hints) { // final public
// Dispatch to other handlers based on hints.
},
rewriteScriptUrl: function (absUrl, hints), // abstract protected
rewriteStylesheetUrl: function (absUrl, hints), // abstract protected
rewriteAudioVisualUrl: function (absUrl, hints), // abstract protected
rewriteDocumentUrl: function (absUrl, hints), // abstract protected
rewriteObjectUrl: function (absUrl, hints), // abstract protected
rewriteUrn: function (absUrl, hints), // abstract protected
rewriteOther: function (absUrl, hints), // abstract protected
}Supporting CodeJavascript URI library that does resolution. Java library for API normalization. Advice for Policy ImplementorsDo not black-list. Domain black-listing is unreliable because of numeric IPAs, open redirectors, and the difficulty of normalizing host-names on the browser. Use the NormalizedUri class to normalize any URL parts that you wish to use in whitelists. All whitelists of host-names should only contain IDNA normalized host-names. Do not fetch the URL speculatively before deciding whether to allow it or not since that might enable XSRF attacks. White-list protocols. Be wary of javascript:, widget: and anything not http: or https: or mailto:. Check the context hints. If you don't have enough hints, DENY. Be careful with rewriteOther. If you implement it in such a way that it doesn't deny anything, you must be careful to stay apprised of changes to the URL policy API. Be careful of the encoding of text documents. The encoding of a document often affects the encoding of % encoded octets of URLs in that document, so it's a good idea to re-encode text as UTF-8. Don't use regular expressions to decompose URLs. If you need to whitelist a particular domain and protocol, look at the domain and protocol fields individually. White-listing by regular expressions tends to be vulnerable to URL spoofing. Almost all urls should be rewritten to be fetched by a proxy. The proxy ought to have the same level of amount of access as the authors of the gadgets ie. if gadgets are fetched from the internet, urls ought to be rewritten to use a public proxy to prevent gadgets from scanning internal networks via url fetching errors. If a url must be fetched without proxying, the host name ought to be fully qualified and terminated with a dot suffix (http://www.example.com.). If such a precaution is not taken, a gadget can be used to probe an internal network. Also, consider rejecting any URLs with non-standard ports, e.g. http to port 22. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sign in to add a comment