What's new? | Help | Directory | Sign in
Google
             
Search
for
Updated Nov 15, 2008 by pilgrim
Labels: about-security, is-article
ArticleUtf7  
UTF-7: the case of the missing charset

Español日本語Français
HomeWeb Security

UTF-7 is an encoding originally designed for SMTP gateways that couldn't deal with 8-bit/binary characters. It uses a modified Base64 encoding to represent 8-bit characters and ASCII non-printable characters using 7-bit ASCII. The string <script>alert(1)</script> can be encoded in UTF-7 as +ADw-script+AD4-alert(1)+ADw-/script+AD4-

When the web server does not include an explicit character encoding in its HTTP response -- be it in the Content-Type HTTP Header or the META tag in the HTML itself -- Internet Explorer will attempt to guess the encoding. If certain strings of user input -- say, +ADw-script+AD4-alert(1)+ADw-/script+AD4- -- are echoed back early enough in the HTML page, Internet Explorer may incorrectly guess that the page is encoded in UTF-7. Suddenly, the otherwise harmless user input becomes active HTML and will execute.

Solution

To set the encoding in the HTTP headers, use the charset parameter of the Content-Type header:

Content-Type: text/html; charset=UTF-8

To set the encoding with the HTML document, use a <meta> tag:

<meta http-equiv="Content-type" content="text/html; charset=utf-8">

Important: The <meta> tag must appear in the document before any content that could be controlled by an attacker, such as a <title> tag containing a dynamically generated title for the document.

Further reading


Comment by matej.cepl, May 16, 2008

Why to use <meta charset="utf-8"> which has poor coverage (e.g., doesn't work on Firefox), comparing to <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> which was supported by all browsers since the dawn of ages?

Comment by pilgrim, Dec 29, 2008

<meta charset="utf-8"> works in all major browsers, including Firefox 2 and 3.


Sign in to add a comment