|
OutputChecks
sanity checks on cajoler output
Output ChecksThe HTML, CSS, and Javascript that we output should be as clear as simple as possible to make sure that browsers produce the same parse tree. Below are some properties that we can assert on output. Intentional NewlinesOur source code formatter should not output any non-space tokens containing any of the characters listed in http://en.wikipedia.org/wiki/Newline
Comment FreeWe should strip all comments from the output to avoid lexing inconsistencies. Known lexical errors in existing browsers include:
String Literals should not appear to be markup or external entity references or CDATA endsWe should not allow <script> inside a string literal, since if malicious code can trick the rewriter into outputting a </script>, it can open a new script tag whose content starts inside what the browser thinks is a safe string constant. Other problems arise with entity references. If malicious code can escape a script tag, it can insert doctypes, and load external scripts. If malicious code can escape a CDATA section in XHTML then it might be able to insert tags into the page. All of these problems are avoided if the <, <<, <<<, &, and && operators are always followed by space, and if the characters < and & are replaced with their octal equivalents (\074 and \046) in string literals. ASCII identifiersWe should disallow non-ASCII identifiers until we understand browser support for identifiers, and identifier normalization. We should also produce ASCII only output until we have an idea of the ways in which containers inline cajoled output and the encodings they use. Ideally, we will always ship cajoled output in UTF-8 and recommend that containers only inline cajoled code in pages that are UTF-8 encoded. |
Sign in to add a comment
