My favorites | Sign in
Project Home Downloads Issues
New issue   Search
for
  Advanced search   Search tips   Subscriptions
Issue 208: windows-1251 codepage incorrectly recognized as cp1252
3 people starred this issue and may be notified of changes. Back to list
Status:  Migrated
Owner:  ----
Closed:  Mar 2013


Sign in to add a comment
 
Reported by nikolay....@gmail.com, Aug 28, 2011
Which version of PhantomJS are you using?

1.2

What steps will reproduce the problem?

trying to load page http://a.mod-site.net/gb/u/shumil-1.html which is cp1251 encoded

What is the expected output? What do you see instead?

I expect to get correct unicode text, instead I got cp1252 decoded to unicode text.

Which operating system are you using?

Debian GNU/Linux 6.0

Did you use binary PhantomJS or did you compile it from source?

compile

Please provide any additional information below.

The page has the following tag:
<meta http-equiv="content-type" content="text/html; charset=windows-1251">

which means that the content encoding is cp1251 (not cp1252).

Aug 28, 2011
#2 roejame...@gmail.com
Did you check if your terminal is set to UTF-8 encoding?
Aug 28, 2011
#3 nikolay....@gmail.com
Sure, I'm using rxvt-unicode terminal which is definitely support utf8.

I have just checked page http://a.mod-site.net/gb/ (little page which say "Страница не найдена" which means "page not found" in russian) and it is phantomejs`ed well (i.e. properly encoded from cp1251 to utf8).

But on http://a.mod-site.net/gb/u/shumil.html I see incorrect encoding (i.e. all russian text is encoded from cp1252 to utf8 which makes it complete unreadable).
Aug 28, 2011
#4 nikolay....@gmail.com
By phantomjs`ed I mean the following script:


    page.open(address, function (status) {
        if (status !== 'success') {
            phantom.exit();
        } else {
            console.log(page.evaluate(function () {
                return location.href;
            }));
            console.log(page.content);
            phantom.exit();
        }
    });

Aug 29, 2011
Project Member #5 ariya.hi...@gmail.com
It may be the console encoding that is wrong.

Try to rasterize the page (using the rasterize.js) and see if the screen capture shows correct text rendering or not.
Aug 29, 2011
#6 nikolay....@gmail.com
I have attached rasterized page. It contain text in incorrect encoding.
As an example of correct encoding, you may visit this page via chrome or firefox browser.
1.pdf
72.4 KB   Download
Aug 29, 2011
Project Member #7 ariya.hi...@gmail.com
This seems like an upstream bug for QtWebKit.
Aug 29, 2011
#8 nikolay....@gmail.com
It seems so.

Just curious -- how hard it may be to use chromium webkit engine as a backend for phantomjs?
Aug 29, 2011
Project Member #9 ariya.hi...@gmail.com
As with Chromium, I suggest creating a new issue to discuss that. I rather not pollute each issue with off-topic discussion.
Jul 5, 2012
#10 fiza...@gmail.com
Thanks for such a great post and the review, I am totally impressed! Keep stuff like this coming.	<a href="http://www.laptop-inn.com">laptop reviews</a>

Mar 15, 2013
Project Member #11 james.m....@gmail.com
Closing. This issue has been moved to GitHub: https://github.com/ariya/phantomjs/issues/10208
Status: Migrated
Sign in to add a comment

Powered by Google Project Hosting