Skip to content
This repository has been archived by the owner on Apr 21, 2023. It is now read-only.

URLs with UTF-8 characters in them cannot be rewritten. #81

Closed
GoogleCodeExporter opened this issue Apr 6, 2015 · 12 comments
Closed

URLs with UTF-8 characters in them cannot be rewritten. #81

GoogleCodeExporter opened this issue Apr 6, 2015 · 12 comments

Comments

@GoogleCodeExporter
Copy link

mark.harburn reported in issue 10 getting this error:

[Fri Nov 12 20:27:38 2010] [error] log_message_handler: 
[1112/202738:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not 
supported (non-icu build)

I think this is worth considering separtely

Original issue reported on code.google.com by sligocki@google.com on 13 Nov 2010 at 6:16

@GoogleCodeExporter
Copy link
Author

Do you want me to build a debug version of the rpm,  Just need to know what to 
enable/apache log level is needed and i'll get that over to you.

Original comment by mark.har...@gmail.com on 13 Nov 2010 at 6:22

@GoogleCodeExporter
Copy link
Author

It'd be great it you could tell us the minimal configuration that you get this 
message for. Does it happen when you set

ModPagespeedRewriteLevel PassThrough

etc.

Then, if you can set LogLevel to Info and post a log, that would be very 
helpful.

Thanks.

Original comment by sligocki@google.com on 13 Nov 2010 at 6:32

@GoogleCodeExporter
Copy link
Author

Had to enable a bunch of filters together to make it happen (404 is a handler 
for redirecting profiles via a 404 to a real page)

I have these enabled:

ModPagespeedEnableFilters outline_css,outline_javascript
ModPagespeedEnableFilters inline_javascript
ModPagespeedEnableFilters rewrite_images
ModPagespeedEnableFilters insert_img_dimensions
ModPagespeedEnableFilters remove_comments
ModPagespeedEnableFilters elide_attributes


[Sat Nov 13 18:52:12 2010] [warn] 
http://www.profileheaven.com/redirect404.php:611: Unexpected close-tag `div', 
no tags are open
[Sat Nov 13 18:52:12 2010] [warn] 
http://www.profileheaven.com/redirect404.php:618: Unexpected close-tag `div', 
no tags are open
[Sat Nov 13 18:52:12 2010] [warn] 
http://www.profileheaven.com/redirect404.php:620: Unexpected close-tag `td', no 
tags are open
[Sat Nov 13 18:52:12 2010] [warn] 
http://www.profileheaven.com/redirect404.php:636: Unexpected close-tag `tr', no 
tags are open
[Sat Nov 13 18:52:12 2010] [warn] 
http://www.profileheaven.com/redirect404.php:636: Unexpected close-tag `table', 
no tags are open
[Sat Nov 13 18:52:12 2010] [warn] 
http://www.profileheaven.com/redirect404.php:638: Unexpected close-tag `div', 
no tags are open
[Sat Nov 13 18:52:12 2010] [info] 
http://www.profileheaven.com/redirect404.php:1: 7899us: HtmlParse::Flush
[Sat Nov 13 18:52:12 2010] [info] 
http://www.profileheaven.com/redirect404.php:1: 7911us: 
HtmlParse::CoalesceAdjacentCharactersNodes
[Sat Nov 13 18:52:12 2010] [info] 
http://www.profileheaven.com/redirect404.php:1: 7963us: 
HtmlParse::ApplyFilter:AddHead
[Sat Nov 13 18:52:12 2010] [info] 
http://www.profileheaven.com/redirect404.php:1: 8006us: HtmlParse::SanityCheck
[Sat Nov 13 18:52:12 2010] [info] 
http://www.profileheaven.com/redirect404.php:1: 8115us: 
HtmlParse::ApplyFilter:OutlineCss
[Sat Nov 13 18:52:12 2010] [info] 
http://www.profileheaven.com/redirect404.php:22: Inline element not outlined 
because its size 103, is below threshold 2048
[Sat Nov 13 18:52:12 2010] [info] 
http://www.profileheaven.com/redirect404.php:32: Inline element not outlined 
because its size 125, is below threshold 2048
[Sat Nov 13 18:52:12 2010] [error] log_message_handler: 
[1113/185212:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not 
supported (non-icu build)
[Sat Nov 13 18:52:12 2010] [error] log_message_handler: 
[1113/185212:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not 
supported (non-icu build)
[Sat Nov 13 18:52:12 2010] [error] log_message_handler: 
[1113/185212:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not 
supported (non-icu build)
[Sat Nov 13 18:52:12 2010] [info] 
http://www.profileheaven.com/redirect404.php:1: 8742us: HtmlParse::SanityCheck
[Sat Nov 13 18:52:12 2010] [info] 
http://www.profileheaven.com/redirect404.php:1: 8841us: 
HtmlParse::CoalesceAdjacentCharactersNodes
[Sat Nov 13 18:52:12 2010] [info] 
http://www.profileheaven.com/redirect404.php:1: 8875us: 
HtmlParse::ApplyFilter:OutlineJs
[Sat Nov 13 18:52:12 2010] [info] 
http://www.profileheaven.com/redirect404.php:86: Inline element not outlined 
because its size 161, is below threshold 2048
[Sat Nov 13 18:52:12 2010] [info] 
http://www.profileheaven.com/redirect404.php:198: Inline element not outlined 
because its size 1752, is below threshold 2048
[Sat Nov 13 18:52:12 2010] [info] 
http://www.profileheaven.com/redirect404.php:245: Inline element not outlined 
because its size 131, is below threshold 2048
[Sat Nov 13 18:52:12 2010] [info] 
http://www.profileheaven.com/redirect404.php:257: Inline element not outlined 
because its size 305, is below threshold 2048
[Sat Nov 13 18:52:12 2010] [info] 
http://www.profileheaven.com/redirect404.php:575: Inline element not outlined 
because its size 149, is below threshold 2048
[Sat Nov 13 18:52:12 2010] [info] 
http://www.profileheaven.com/redirect404.php:630: Inline element not outlined 
because its size 163, is below threshold 2048
[Sat Nov 13 18:52:12 2010] [info] 
http://www.profileheaven.com/redirect404.php:645: Inline element not outlined 
because its size 68, is below threshold 2048

Original comment by mark.har...@gmail.com on 13 Nov 2010 at 6:56

@GoogleCodeExporter
Copy link
Author

Thanks @mark.harburn, do you have any non-ASCII characters in your URLs? Could 
you post the page that is causing these problems? (we understand if you can't)

Actually, looking at this again, is it possible we are parsing the raw php 
file? It looks like the parser is finding a number of unexpected closing tags 
for example. We have seen reports of mod_pagespeed parsing php files and cgi 
scripts as if they were html pages. Does this page display/act correctly after 
being served through mod_pagespeed?

Original comment by sligocki@google.com on 15 Nov 2010 at 3:27

@GoogleCodeExporter
Copy link
Author

Hi,

It follows this process:
Client goes to http://www.profileheaven.com/username
/username doesn't exist so apache does a 404 to redirect404.php
redirect404.php connects to the database and pulls the profile page and 
presents it as the current url (it doesn't actually 404) so all the html 
generation should happen by the backend php and it should then present HTML to 
redirect404 which should then get parsed.

Original comment by mark.har...@gmail.com on 15 Nov 2010 at 11:54

@GoogleCodeExporter
Copy link
Author

Summary was:   Error: ReadUTFChar not supported (non-icu build)

The symptom reported in this bug summary does not occur any longer; at least 
not "loglevel warn".  We still report this issue at "loglevel info" (along with 
many other issues such as html syntax).

URLs that have non-ascii UTF-8 cannot be rewritten yet, but the logs do not 
fill and the server load should not be impacted.  mod_pagespeed will not break 
the page, but it may not benefit it fully until UTF-8 is supported.


Original comment by jmara...@google.com on 19 Nov 2010 at 11:34

  • Changed title: URLs with UTF-8 characters in them cannot be rewritten.
  • Added labels: Type-Enhancement
  • Removed labels: Type-Defect

@GoogleCodeExporter
Copy link
Author

Hi,  I just bult from source and this still is still logged as error on the 
current build.  I've set to crit and it no longer shows.

Original comment by mark.har...@gmail.com on 20 Nov 2010 at 2:00

@GoogleCodeExporter
Copy link
Author

OK I think we need to make a testcase.  Converting this back to a Defect until 
we repro & prove it's fixed or at least silent.

Original comment by jmara...@google.com on 20 Nov 2010 at 2:06

  • Added labels: Type-Defect
  • Removed labels: Type-Enhancement

@GoogleCodeExporter
Copy link
Author

Just for refrence:

[Sat Nov 20 15:18:53 2010] [error] [mod_pagespeed 0.9.10.1-244] 
[1120/151853:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not 
supported (non-icu build)
[Sat Nov 20 15:18:53 2010] [error] [mod_pagespeed 0.9.10.1-244] 
[1120/151853:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not 
supported (non-icu build)
[Sat Nov 20 15:18:53 2010] [error] [mod_pagespeed 0.9.10.1-244] 
[1120/151853:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not 
supported (non-icu bui

Original comment by mark.har...@gmail.com on 20 Nov 2010 at 3:24

@GoogleCodeExporter
Copy link
Author

Same problem with mod_pagespeed 0.9.10.1-250
[error] [mod_pagespeed 0.9.10.1-250] 
[1126/122848:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not 
supported (non-icu build)
website: ratealo.com

If i can test anything just tell me

Original comment by dario.va...@gmail.com on 26 Nov 2010 at 11:37

@GoogleCodeExporter
Copy link
Author

 [error] [mod_pagespeed 0.9.10.1-250] [1203/222702:ERROR:googleurl_noicu/url_canon_noicu.cc(60)] ReadUTFChar not supported (non-icu build)
Website : khoahocphothong.net
pls help me

Original comment by phuongk...@gmail.com on 3 Dec 2010 at 3:27

@GoogleCodeExporter
Copy link
Author

Alright we've updated our dep on page-speed in r269, this should not only make 
the error an INFO, but actually correctly parse UTF urls.

This fix will be in the next release.

Original comment by sligocki@google.com on 3 Dec 2010 at 3:46

  • Changed state: Fixed

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant