|
ArticleUtf7
UTF-7: the case of the missing charset
UTF-7 is an encoding originally designed for SMTP gateways that couldn't deal with 8-bit/binary characters. It uses a modified Base64 encoding to represent 8-bit characters and ASCII non-printable characters using 7-bit ASCII. The string <script>alert(1)</script> can be encoded in UTF-7 as +ADw-script+AD4-alert(1)+ADw-/script+AD4- When the web server does not include an explicit character encoding in its HTTP response -- be it in the Content-Type HTTP Header or the META tag in the HTML itself -- Internet Explorer will attempt to guess the encoding. If certain strings of user input -- say, +ADw-script+AD4-alert(1)+ADw-/script+AD4- -- are echoed back early enough in the HTML page, Internet Explorer may incorrectly guess that the page is encoded in UTF-7. Suddenly, the otherwise harmless user input becomes active HTML and will execute. Solution
To set the encoding in the HTTP headers, use the charset parameter of the Content-Type header: Content-Type: text/html; charset=UTF-8 To set the encoding with the HTML document, use a <meta> tag: <meta http-equiv="Content-type" content="text/html; charset=utf-8"> Important: The <meta> tag must appear in the document before any content that could be controlled by an attacker, such as a <title> tag containing a dynamically generated title for the document. Further reading |
Sign in to add a comment
Why to use <meta charset="utf-8"> which has poor coverage (e.g., doesn't work on Firefox), comparing to <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> which was supported by all browsers since the dawn of ages?
<meta charset="utf-8"> works in all major browsers, including Firefox 2 and 3.