Export to GitHub

spyderlib - issue #812

Spyder can't start if user's home directory contains non-ASCII characters


Posted on Oct 23, 2011 by Massive Elephant

I couldn't run all spyder shortucts that I can find in my Start menu.

and I got the following error when trying to launch spyder (console):

C:\Windows\system32>C:\Python27\pythonw.exe "C:\Python27\Scripts\spyder"

C:\Windows\system32>C:\Python27\Scripts\spyder.bat Traceback (most recent call last): File "C:\Python27\Scripts\spyder", line 2, in <module> from spyderlib import spyder File "C:\Python27\lib\site-packages\spyderlib\spyder.py", line 99, in <module>

from spyderlib.plugins.editor import Editor

File "C:\Python27\lib\site-packages\spyderlib\plugins\editor.py", line 36, in <module> from spyderlib.widgets.editor import (ReadWriteStatus, EncodingStatus, File "C:\Python27\lib\site-packages\spyderlib\widgets\editor.py", line 29, in <module> from spyderlib.utils.module_completion import moduleCompletion File "C:\Python27\lib\site-packages\spyderlib\utils\module_completion.py", lin e 31, in <module> db = PickleShareDB(MODULES_PATH) File "C:\Python27\lib\site-packages\spyderlib\utils\external\pickleshare.py", line 52, in init self.root = Path(root).expanduser().abspath() UnicodeDecodeError: 'ascii' codec can't decode byte 0xc8 in position 9: ordinal not in range(128)

Comment #1

Posted on Oct 23, 2011 by Massive Elephant

Hi! I found the source of the problem - my windows user name is in Chinese! It's ok when logged in as another windows account which is in pure English...

Comment #2

Posted on Oct 23, 2011 by Happy Camel

Could you please open a standard Python interpreter (outside Spyder) and type the following:

import os.path as osp osp.supports_unicode_filenames

It should return True.

Second test:

from spyderlib.utils.external import path path.path('~').expanduser().abspath()

It should return a 'path' object around your HOME directory.

Comment #3

Posted on Oct 23, 2011 by Happy Camel

I'm merging this issue with Issue 651 as this is the same cause but symptoms are quite different. So please continue this discussion here anyway.

Comment #4

Posted on Oct 24, 2011 by Massive Elephant

@pierre, The first test returned true. And the second test returned an error:

Traceback (most recent call last): File "", line 1, in File "C:\Python27\lib\site-packages\spyderlib\utils\external\path.py", line 13 6, in expanduser def expanduser(self): return self.class(os.path.expanduser(self)) File "C:\Python27\lib\ntpath.py", line 301, in expanduser return userhome + path[i:] UnicodeDecodeError: 'ascii' codec can't decode byte 0xc8 in position 9: ordinal not in range(128)

How to fix it? Can I do it manually with my spyder installation? Thanks.

Comment #5

Posted on Oct 24, 2011 by Happy Camel

I'll try and test some things in a virtual machine and I'll get back to you.

Comment #6

Posted on Oct 24, 2011 by Happy Camel

I've tested this on a virtual machine with a user name with Greek letters and Spyder starts without any error. The second test above succeeds too.

This issue is apparently related to the external library "path" which comes from IPython. I suggest reporting this bug to the IPython mailing list: try to open IPython outside Spyder and you will probably see that it doesn't work.

Comment #7

Posted on Oct 24, 2011 by Massive Elephant

@Pierre, Now spyder starts up successfully in my system! I tracked down the error - it's from the Python's standard lib ntpath.py. The UnicodeDecodeError error happens when the file system encoding is different from the result of sys.getdefaultencoding().

The fix is to add .decode(enc) at the end of calls like os.environ['HOME'].

I got the hint from: http://bugs.python.org/review/6815/patch/431/798 The above link is a reported issue of Python, and it was reported years ago, I'm wondering why it's not been fixed? (not a question to you, sorry).

I attached the file FYI (You can search for "Edwin", or you can compare it with the original one, I use Python 2.72).

Attachments

Comment #8

Posted on Oct 24, 2011 by Massive Elephant

further info - All spyder shortcuts can successfully run once, then I'll have to click the 'reset spyder' link in my Start menu.

the exception is "spyder (light)" which is always ok.

Comment #9

Posted on Oct 26, 2011 by Quick Horse

I'm reopening this issue to investigate it a bit further. If the problem is in path or pickleshare libraries, I'll try to patch them locally since I'm the one who introduced those two.

Comment #10

Posted on Oct 26, 2011 by Quick Horse

Pierre, did you do your tests in Windows XP or Vista/7?

Comment #11

Posted on Oct 26, 2011 by Happy Camel

I did those tests within a Windows XP virtual machine under the user name of "PsiOmegaSigma" (in Greek letters, of course, so definitely non ASCII chars).

Comment #12

Posted on Oct 29, 2011 by Quick Horse

(No comment was entered for this change.)

Comment #13

Posted on Oct 30, 2011 by Massive Elephant

@Pierre, On my system it's Multibyte Character Sets (MBCS), I think it's different from Greek letters. I guess Chinese, Japanese and Korea systems are must be encoded in MBCS, if Unicode encoding is not used (pre Windows 2000).

Comment #14

Posted on Nov 8, 2011 by Quick Horse

Issue 828 has been merged into this issue.

Comment #15

Posted on Nov 13, 2011 by Quick Horse

I think I have a fix for this one but I want to check with you guys first.

MindVisualizer, Pierre, Philippe, could you please execute the next two lines in the same environment where you get the error and tell me what's the output you obtain?

import locale locale.getpreferredencoding()

Thanks.

Comment #16

Posted on Nov 13, 2011 by Quick Horse

A complete solution to this issue must involve a fix to Issue 834. That's why I'm creating a block on that one.

Comment #17

Posted on Nov 13, 2011 by Swift Dog

import locale locale.getpreferredencoding() 'cp1252'

Comment #18

Posted on Nov 14, 2011 by Happy Camel

Same here but I was unable to reproduce the bug as I mentioned earlier.

Comment #19

Posted on Nov 14, 2011 by Quick Horse

OK, thanks to both. I got cp1252 too, in XP and Win7, but as Philippe, I also got the bug. This encoding is almost similar to ISO-8895-1 and Microsoft uses it for all Western European languages.

If I understood things right, Windows never uses UTF-8 (or as it calls unicode: cp65001) as its default encoding, even if it can read and handle it, while Mac and Linux do.

So we have to check for locale.getpreferredencoding() to be UTF-8 in the path external library and the bug will be solved. I'll commit the solution this afternoon/evening because I have to go right now.

Comment #20

Posted on Nov 14, 2011 by Swift Dog

thanks!

Comment #21

Posted on Nov 14, 2011 by Quick Horse

This issue was updated by revision 9bf4f03a28b2.

-. Path was assuming that if the filesystem supports unicode, then directories were in unicode too. But in Windows they are encoded on the system enconding (cp1252 for Western European languages). -. This was giving a hard crash while starting Spyder.

Comment #22

Posted on Nov 22, 2011 by Quick Dog

Update by the original bug reporter: A workaround would be to change the TEMP and TMP system environment variables to paths that contains ASCII-characters only.

Comment #23

Posted on Nov 22, 2011 by Quick Horse

Thanks for the update. I don't think changing TEMP/TMP would work because we need to save Spyder user settings (such as preferred fonts, syntax coloring scheme, plugin arrangement, etc), so we have to create a directory in HOME.

But I would like to ask you: Are you seeing the problem after the update to 2.1.2 or not? Is it OK yo close this issue?

Comment #24

Posted on Nov 25, 2011 by Quick Dog

sorry, you are not, it succeeded the first time and than will always fail... Just installed v2.1.2, it seems the problem is still here:

C:\Python27>C:\Python27\python.exe "C:\Python27\Scripts\spyder" Traceback (most recent call last): File "C:\Python27\Scripts\spyder", line 2, in from spyderlib import spyder File "C:\Python27\lib\site-packages\spyderlib\spyder.py", line 99, in

from spyderlib.plugins.editor import Editor

File "C:\Python27\lib\site-packages\spyderlib\plugins\editor.py", line 36, in from spyderlib.widgets.editor import (ReadWriteStatus, EncodingStatus, File "C:\Python27\lib\site-packages\spyderlib\widgets\editor.py", line 29, in from spyderlib.utils.module_completion import moduleCompletion File "C:\Python27\lib\site-packages\spyderlib\utils\module_completion.py", lin e 31, in db = PickleShareDB(MODULES_PATH) File "C:\Python27\lib\site-packages\spyderlib\utils\external\pickleshare.py", line 52, in init self.root = Path(root).expanduser().abspath() UnicodeEncodeError: 'ascii' codec can't encode characters in position 9-12: ordi nal not in range(128)

Comment #25

Posted on Nov 26, 2011 by Swift Dog

with 2.1.2, I can install&use spyder on windows 7 with a user name that contains a "ç"

Comment #26

Posted on Nov 27, 2011 by Quick Dog

I guess Chinese characters are encoded in MBCS (multi-bytes), which is different from characters like "ç" which is always encoded in single byte only?

Comment #27

Posted on Nov 27, 2011 by Quick Dog

I'm new to Python and haven't had a change to use spyderlib or dig into python yet, or maybe I can help... Maybe months later...

Comment #28

Posted on Dec 5, 2011 by Quick Horse

Edwin, let's try to see if we can fix it but I'm going to need your help.

  1. To see if you have the fix I proposed, please open the file C:\Python27\lib\site-packages\spyderlib\utils\external\path.py and look for the words "Spyder patch". If you can't find them, it means you have to reinstall Spyder.

  2. After that, please tell me what you get if you input these commands in a python interpreter:

import locale locale.getpreferredencoding()

  1. Then, please give me the output of these commands:

from spyderlib.utils.module_completion import MODULES_PATH from spyderlib.utils.external.pickleshare import PickleShareDB PickleShareDB(MODULES_PATH)

Thanks for your help

Comment #29

Posted on Dec 7, 2011 by Quick Dog

@ccordoba12,

1 - Checked, I've got that patch.

2 - >>> locale.getpreferredencoding() returns 'cp936'

3 - the results are:

from spyderlib.utils.module_completion import MODULES_PATH Traceback (most recent call last): File "", line 1, in File "C:\Python27\lib\site-packages\spyderlib\utils\module_completion.py", lin e 31, in db = PickleShareDB(MODULES_PATH) File "C:\Python27\lib\site-packages\spyderlib\utils\external\pickleshare.py", line 52, in init self.root = Path(root).expanduser().abspath() UnicodeEncodeError: 'ascii' codec can't encode characters in position 9-12: ordi nal not in range(128) from spyderlib.utils.external.pickleshare import PickleShareDB PickleShareDB(MODULES_PATH) Traceback (most recent call last): File "", line 1, in NameError: name 'MODULES_PATH' is not defined

Hope it helps.

Comment #30

Posted on Dec 7, 2011 by Quick Horse

Thank you very much Edwin. Since we have really confirmed that the error is still present, please open again C:\Python27\lib\site-packages\spyderlib\utils\external\path.py and comment lines 55-66, which are the ones that are around "Spyder patch", i.e. these ones

try: # ============ # Spyder patch # ============ # It's not only neccesary to know that the filesystem supports UTF-8. It's # also needed to check that the locale's encoding is UTF-8. if os.path.supports_unicode_filenames and \ locale.getpreferredencoding() == 'UTF-8': _base = unicode _getcwd = os.getcwdu except AttributeError: pass

Then try to start Spyder again

Comment #31

Posted on Dec 8, 2011 by Quick Dog

@ccordoba12,

I commented line 55 to 66 as you instructed, and I got the following error:

C:>C:\Python27\python.exe "C:\Python27\Scripts\spyder" Traceback (most recent call last): File "C:\Python27\Scripts\spyder", line 2, in from spyderlib import spyder File "C:\Python27\lib\site-packages\spyderlib\spyder.py", line 99, in

from spyderlib.plugins.editor import Editor

File "C:\Python27\lib\site-packages\spyderlib\plugins\editor.py", line 36, in from spyderlib.widgets.editor import (ReadWriteStatus, EncodingStatus, File "C:\Python27\lib\site-packages\spyderlib\widgets\editor.py", line 29, in from spyderlib.utils.module_completion import moduleCompletion File "C:\Python27\lib\site-packages\spyderlib\utils\module_completion.py", lin e 31, in db = PickleShareDB(MODULES_PATH) File "C:\Python27\lib\site-packages\spyderlib\utils\external\pickleshare.py", line 52, in init self.root = Path(root).expanduser().abspath() UnicodeEncodeError: 'ascii' codec can't encode characters in position 9-12: ordi nal not in range(128)

Comment #32

Posted on Dec 11, 2011 by Quick Horse

Thanks a lot Edwin for you help. I really don't understand what's happening but this clearly shows that we need a better way to activate/deactivate this functionality at user's will.

I'll start to work on it for the next release.

Comment #33

Posted on Dec 12, 2011 by Massive Elephant

@ccordoba12,

I think comment 7 tells the source of the problem?

Comment #34

Posted on Dec 12, 2011 by Quick Horse

Thanks for pointing that out. I tried it in my case and it's working pretty well, so please help me to try it in yours following these steps:

  1. Modify my patch in C:\Python27\lib\site-packages\spyderlib\utils\external\path.py to leave it like this:

try: # Spyder patch # ============ # It's not only neccesary to know that the filesystem supports UTF-8. We # also need to check that the locale's encoding is UTF-8 too. # if os.path.supports_unicode_filenames and \ # locale.getpreferredencoding() == 'UTF-8': if os.path.supports_unicode_filenames: _base = unicode _getcwd = os.getcwdu except AttributeError: pass

Note how I'm commenting the first two lines after the comments to leave just the first one

  1. Open C:\Python27\lib\site-packages\spyderlib\utils\module_completion.py and replace this line (line 28):

MODULES_PATH = get_conf_path('db')

with these lines:

import locale enc = locale.getpreferredencoding() MODULES_PATH = get_conf_path('db').decode(enc)

Note how in the last line I'm using your proposal.

Hope this solves the bug for you.

Comment #35

Posted on Dec 14, 2011 by Massive Elephant

@ccordoba12,

Thanks for your help, it still failed... Maybe it'll be easier if you create a temporary folder that contains Chinese character and point the MODULES_PATH to that folder temporarily.

Tips for creating a folder contains Chinese character in your system: 1 - go to: http://www.baidu.com/ and copy several Chinese characters; 2 - in Windows Explorer create a new folder and paste the Chinese characters from the clipboard.

Hope it helps.

Last error:

C:\Users\日常使用>C:\Python27\python.exe "C:\Python27\Scripts\spyder" Traceback (most recent call last): File "C:\Python27\Scripts\spyder", line 2, in from spyderlib import spyder File "C:\Python27\lib\site-packages\spyderlib\spyder.py", line 99, in

from spyderlib.plugins.editor import Editor

File "C:\Python27\lib\site-packages\spyderlib\plugins\editor.py", line 36, in from spyderlib.widgets.editor import (ReadWriteStatus, EncodingStatus, File "C:\Python27\lib\site-packages\spyderlib\widgets\editor.py", line 29, in from spyderlib.utils.module_completion import moduleCompletion File "C:\Python27\lib\site-packages\spyderlib\utils\module_completion.py", lin e 31, in MODULES_PATH = get_conf_path('db').decode(enc) UnicodeEncodeError: 'ascii' codec can't encode characters in position 9-12: ordi nal not in range(128)

Comment #36

Posted on Dec 16, 2011 by Quick Horse

Thanks for the tip. I'll use it to nail down this issue on my side.

The problem is that we haven't been able to identify your locale encoding correctly. If we could do it then we could decode MODULES_PATH to unicode and everything will be OK from there.

Please help me to do it giving me the outputs of these commands:

import sys sys.stdin.encoding sys.getdefaultencoding()

Thanks for your help and for your patience to help me debug this. In the meantime, I'll add some exceptions to avoid using this functionality so that all people in your situation can start and use Spyder without problems. Don't worry, it won't turn off anything crucial. It'll make Spyder to work just a little bit slower.

Comment #37

Posted on Dec 25, 2011 by Quick Dog

Sorry for the late reply, I don't use the mailbox I use to receive the notification emails very often.

import sys sys.stdin.encoding 'cp936' sys.getdefaultencoding() 'ascii'

Comment #38

Posted on Jan 6, 2012 by Quick Horse

Issue 902 has been merged into this issue.

Comment #39

Posted on Jan 6, 2012 by Quick Horse

Thanks edwin for your suggestion in comment #35. I went further and created an account with the same user name as yours to better test my fixes. Turns out the problem was not where I thought, i.e. in the path external library. The real fix is to encode the user home directory in utf-8.

I'm uploading here a copy of my fixes so people with this problem can download it and see if it's working for them. To do it please follow these steps:

  1. Uncompress it outside of your user account, for example in C:\spyder
  2. Open a command line going to Start > Run > cmd.exe (or something like that, I don't exactly remember)
  3. Run Spyder with this command: python bootstrap.py
Attachments

Comment #40

Posted on Jan 6, 2012 by Quick Horse

Pierre, since I'm not so sure about my fixes, I created a clone here:

http://code.google.com/r/ccordoba12-non-ascii/source/list

Please take a look at my commits (they appear with a star) and tell me if they're meaningful.

I also created a code review here

http://codereview.appspot.com/5503068

although, unfortunately, commit history is not shown there.

Comment #41

Posted on Jan 7, 2012 by Happy Camel

"(they appear with a star)" --> only for you ;-)

I've just wrote a comment on the code review but honestly to check if this works, it requires testing. So if you have tested with folders containing chinese characters, than that's fine for me.

Comment #42

Posted on Jan 7, 2012 by Quick Horse

About the stars -> I was imaging that, that's why I created the code review.

I can't see your comment, but that's probably because I forgot to add you as a reviewer (silly me). Sorry for that.

Comment #43

Posted on Jan 7, 2012 by Happy Camel

No sorry, that's my fault: I forgot to publish the comment. This is now done.

Comment #44

Posted on Mar 4, 2012 by Quick Dog

@ccordoba12,

I tested the spyder.rar attachment and it seems to be working, here is the command line execution details:

F:\Temp\spyder\spyder>python bootstrap.py Executing Spyder from source checkout 'hg' is not a valid command. Error: Failed to get revision number from Mercurial - need more than 0 values to unpack 01. Patched sys.path with F:\Temp\spyder\spyder 02. PySide is detected, selecting (experimental) 03. Imported Spyder 2.1.6 (Qt 4.7.4 via PySide 1.0.7) 0x. Enforcing parent console (Windows only) 04. Executing spyder.main()


Here is a dialog message upon the first run of it:

Spyder

Spyder crashed during last session.

If Spyder does not start at all and before submitting a bug report, please try to reset settings to defaults by running Spyder with the command line option '--reset':
python spyder --reset

Warning: this command will remove all your Spyder configuration files located in 'C:\Users\日常使用.spyder2').

If restoring the default settings does not help, please take the time to search for known bugs or discussions matching your situation before eventually creating a new issue here. Your feedback will always be greatly appreciated.

OK

PS, before trying this spyder.rar attachment, I've also installed official release spyder-2.1.8_py27 and it still has the problem, I guess you haven't merged the fix to it?

Thanks.


Comment #45

Posted on Apr 14, 2012 by Happy Camel

(No comment was entered for this change.)

Comment #46

Posted on Apr 17, 2012 by Helpful Kangaroo

I'll contribute some additional test results.

I pulled down Carlos's latest changes from here:

http://code.google.com/r/ccordoba12-v21/ http://code.google.com/r/ccordoba12-work/

Both have the same changes related to this issue. Here are my results.

I created a fresh account on English Windows 7 with username Cárlos. When I run the versions above I don't get any crashes. I do, however, get unexpected behavior, and it is identical for both clones. Instead of getting all the .spyder2 files in c:\Users\Cárlos most of them show up in c:\Users\Cßrlos. Only the .ropeproject files show up in c:\Users\Cárlos. After a bit more research, I think I can explain what's happening.

Calls to os.path and os.environ in Python 2.7 basically return raw byte strings. They have to be decoded with the codec that was used to encode them before they can be interpreted as text, and that codec would be the one the OS is using internally. Python 3 I believe does this for you behind the scenes, and the unicode strings you receive are already properly decoded. Carlos, the addition of the constant DEFAULT_ENCODING in encoding.py is the right way to go for today for Python 2.7. However, the function being used to set the constant is probably fancier than it needs to be. The reason for the translation from Cárlos to Cßrlos is due to this line in encoding.getdefaultencoding():

enc = sys.stdin.encoding

That basically ends up setting DEFAULT_ENCODING to whatever stdin is using. At least on my English Windows 7 system, that ends up returning the encoding that DOS command shells use, which is 'cp437' (see http://en.wikipedia.org/wiki/Code_page_437), and that's not what the Windows filesystem is using to encode paths and environment variables. The character code for á is 225 in both unicode and 'cp1252' (the legacy character encoding for much of Windows), and you enter it from the keyboard using Alt+225. However, character 225 in 'cp437' is ß, so that would explain where that mix up is coming from.

I think MindVisualizer had the right answer in comment 7. After looking at the solution used in ntpath.py and researching things a bit more it looks like a call to sys.getfilesytemencoding() is the preferred way to set the value for DEFAULT_ENCODING for file paths and environment variables. This approach appears to be used in more than one location inside the Python code itself. Then with each call to os.path or os.environ you have to make sure to decode it with that codec before attempting to manipulate it as text.

The attached patch contains the changes that resolved everything for my system. It would apply to the first clone above, but the change is essentially the same for the second clone, too. I have not made any attempt to test this with Chinese or other true multi-byte characters, so that might need to be examined further to see if it truly handles all the cases. But I'm guessing it will.

Attachments

Comment #47

Posted on Apr 17, 2012 by Helpful Kangaroo

I'll propose one less invasive patch. This would apply to http://code.google.com/r/ccordoba12-v21/ in place of the patch in comment 46.

It would be useful to draw a distinction between the encoding the OS is using to encode file paths and environment variables and the encoding that any particular file might use. The list of CODECS in spyderlib.utils.encoding looks like it was intended to be the list of codecs to try on the contents of files when trying to open them. The attached patch separates the two encoding problems by removing DEFAULT_ENCODING from the CODECS list and introducing a new constant FS_ENCODING that handles file system decoding.

Attachments

Comment #48

Posted on Apr 18, 2012 by Quick Horse

Jed, thanks a lot for your thorough review. I really don't feel at ease with encodings so I just preferred to grab what Ipython guys do and work on top of it.

I find your proposed solution cleaner and simpler than mine and since it works as well as it, I'll be in favor of committing it. But there are two things:

  1. This change in userconfig.py breaks the console monitor on Linux:

    • os.environ.get() returns a raw byte string which needs to be

    • decoded with the codec that the OS is using to represent environment

    • variables.

    • path = os.environ.get(env_var, '').decode(encoding.FS_ENCODING)

During console startup I'm getting 'import sitecustomize failed'. I really don't understand what's happening here but when I print FS_ENCODING from there I got 'None'.

  1. Wouldn't be safer to use locale.getpreferredencoding() instead of sys.getfilesystemencoding() for FS_ENCODING? When I change it, the previous error disappear and (I think) we are getting the 'right' encoding. I mean, on the chinese version of Windows I got 'cp936' for locale.getpreferredencoding(), which is the one in which is encoded the username I took from Edwin. However I got 'mbcs' for sys.getfilesystemencoding().

Comment #49

Posted on Apr 18, 2012 by Helpful Kangaroo

I'm no encoding expert either, but I was simply trying to mirror what the Python guys had done in the patch referenced in comment 7. I think there is an important distinction between how the filesystem encodes data and how applications encode the text within them. That's the difference between sys.getfilesystemencoding() and locale.getpreferredencoding(). See the specific doc links for both:

http://docs.python.org/library/sys.html#sys.getfilesystemencoding http://docs.python.org/library/locale.html#locale.getpreferredencoding

Reviewing the docs once more for me was useful because now know why you had problem number 1, which I completely overlooked :). On Linux, None will be a very common return value from sys.getfilesystemencoding(). Strictly speaking, Linux does not have a default encoding because it's controlled by locale settings, but this suggests that ascii is the theoretical default:

http://docs.python.org/howto/unicode.html#unicode-filenames

Maybe we should default to 'utf-8' since it would be compatible with ascii anyway and cover more bases in case the locale settings are not right. I've attached another version of the patch that fixes this. Carlos, would you mind testing it on Linux?

As for number 2, according to that last link Windows changes its encoding depending on its flavor, so Python uses 'mbcs' to refer to whatever Windows is currently using. At any rate, Windows filenames are already in unicode, so the decoding doesn't really do anything except turn the unicode object into a byte string object. But, as always, the proof will be in the testing. You could push another commit to your clones and ask for help testing with Chinese Windows. I'm not setup to do that.

Attachments

Comment #50

Posted on Apr 19, 2012 by Helpful Kangaroo

On my Ubuntu environment, sys.getfilesystemencoding() and locale.getpreferredencoding() both return 'UTF-8'.

Based upon the ambiguity surrounding file system encoding on Linux described in this (seriously long) discussion here: http://bugs.python.org/issue13643, maybe we adopt the fallback position that if sys.getfilesystemencoding() returns None we accept Python's best guess at the locale encoding, which would come from locale.getpreferredencoding(). A patch that implements this is attached. Again, the patch applies against this clone:

http://code.google.com/r/ccordoba12-v21/

Attachments

Comment #51

Posted on Apr 19, 2012 by Helpful Kangaroo

Python 3.2 has a modification to sys.getfilesystemencoding():

http://docs.python.org/py3k/library/sys.html?highlight=sys#sys.getfilesystemencoding

It now returns 'utf-8' for Linux if nl_langinfo(CODESET) fails instead of returning None. This is what the patch attached to comment 49 is doing.

Comment #52

Posted on Apr 20, 2012 by Quick Horse

Thanks a lot for digging deeper on this issue Jed. Now we can be a lot more confident that we are approaching the issue from the right side. I pushed your patch to my repo along with some further changes and everything is working smooth.

I think there is only one thing remaining: it seems that encoding.to_unicode is used in some other places to deal with filesystem paths. These are the ones I found with ack:

plugins/history.py 205: filename = encoding.to_unicode(filename)

plugins/editor.py 1299: text = os.linesep.join([encoding.to_unicode(qstr) 1476: fname = osp.abspath(encoding.to_unicode(fname))

widgets/editor.py 1021: if filename == encoding.to_unicode(self.tempfile_path):

Should we change them to use '.decode(FS_ENCODING)' for the sake of correctness (and probably to avoid future bugs)? Or should we leave things as they are? Please analyze those lines in context and tell me what you think.

Comment #53

Posted on Apr 20, 2012 by Helpful Kangaroo

Carlos, I started looking at the locations above, and it became clear that a small change to sypderlib.encoding would make things much more readable in the rest of the code. Please accept the attached patch as a proposal to make things a little better and not as a criticism of where you had already arrived. You solution was right on.

I've basically created a couple of functions to handle decoding and encoding from the file system, encoding.to_unicode_from_fs() and encoding.to_fs_from_unicode(). You can basically run any string or unicode object through them and they'll do the appropriate conversion only when required. This prevents you from accidentally calling .decode() again on something that is already unicode. That is what encoding.to_unicode() was doing, so I mirrored it.

Anyway, I went through the above locations and applied the new functions as I thought was appropriate. If these test well and look okay to you, would you like to port them ahead to your v22 clone, and I can continue testing there?

Attachments

Comment #54

Posted on Apr 24, 2012 by Helpful Kangaroo

I went ahead an integrated the changes described in the above all the way up through the patch in comment 53. I have created two clones that are basically ready to be pulled into the v21 and default repositories, respectively, pending final review.

http://code.google.com/r/jedludlow-v21-issue812/

http://code.google.com/r/jedludlow-default-issue812/

I have tested both of these versions under these OS configurations: * Windows 7, ascii username * Windows 7, username with Chinese characters * Ubuntu 11.20, ascii username under US English utf-8 encoding locale

All were running Python 2.7 using recent PyQt releases. In short, both versions launch without incident on all three configurations. The .spyder2 files are created in the appropriate locations, and there are no signs of mojibake. On Windows 7, it is possible to run a script from a directory with Chinese characters provided that the OS is configured with Chinese character support. Spyder itself cannot yet be run from within a directory with non-ascii characters, so there is still work to do there. But that is outside the scope of this issue. I can also report that the object inspector with full equation rendering works for the default repository version under all three OS configurations! The v21 version does not support equation rendering yet, and it also does not support rich text formatting yet when running under a username with non-ascii characters.

Comment #55

Posted on Apr 24, 2012 by Quick Horse

Jed thanks for going ahead with this issue. I've been quietly working on it because I discovered some more problems I'm trying to solve right now. Please don't merge this until I publish a new bookmark.

I also don't applied your last patch on top of my last commits but started afresh from my last merge with tip to have a cleaner mercurial history.

Comment #56

Posted on Apr 24, 2012 by Helpful Kangaroo

I only created the clones in comment 54 to bring together all the changes in a clean way so I could test them on the OS configurations I mentioned. It's was not trivial to get them created since the v21 and default repos have diverged enough from their once-common ancestry that it's no longer possible to just apply a change to the v21 repo and then simply pull it over to default. Since I understand the changes well I thought I'd try to simplify the pull process by applying all the changes to clean copies of the latest project repos. Furthermore, I wanted to simplify the testing process for anyone who wanted to try the changes.

Comment #57

Posted on Apr 28, 2012 by Quick Horse

Jed I finally had time to publish my new bookmark. It's called new_i812 and I added it only to my 21 repo. I discovered more unicode problems which took me a lot of time to solve (now I remember why I left this bug unfinished: because it was taking me too much time to solve properly!)

I'm also attaching a patch to solve some more problems with the IPython console. The first one let us change the cwd using the working directory plugin and the second one let us run a file on the Editor in IPython (and also Python) using F5. I'm not so sure about the coding of those changes (although they are working pretty well), so I would like to hear what you think about them.

The problem with IPython is that it uses a different encoding than the Python console. You can clearly see that if you define ss = 'cárlos' in both, and then print it. I don't know how to change it though :)

Attachments

Comment #58

Posted on Apr 28, 2012 by Quick Horse

I'm sorry, it's the other way around: the first patch is needed for IPython and Python, and the second one only for IPython.

Comment #59

Posted on Apr 30, 2012 by Helpful Kangaroo

Just did some testing with your bookmarked changes mentioned in comment 57. I have not applied the additional ipython patch yet.

I was unable to get spyder to launch under a username with Chinese characters with your latest updates, but my earlier consolidated changes at comment 54 do work. I did a diff of clean copies of the whole tree at

http://code.google.com/r/jedludlow-v21-issue812/

and

http://code.google.com/r/ccordoba12-v21/ (updated to the tip of your bookmark new_i812)

and I noticed there were a few additional locations where I was calling to_unicode_from_fs that we had discussed in previous patches that were absent from your repo. Rather than try to cover them here it would be much simpler if you could pull the repositories and diff them.

Just curious, were the QString checks you added here in response to a particular problem you were seeing?

http://code.google.com/r/ccordoba12-v21/source/browse/spyderlib/utils/encoding.py?name=new_i812#55

I guess my intent with to_unicode_from_fs was to confine its use to those situations where you were dealing with a string of raw bytes returned from some direct call to the file system. I didn't see any locations in the code where we were calling this with a QString argument. I might argue for throwing an exception if the argument is actually a QString since that is outside the intended use of the function to begin with.

Could I suggest that we work toward committing a baseline set of the smallest set of changes that solve this issue to the best of our knowledge? We can then begin testing them from a common code base, submitting additional minor changes to clean up anything we might have missed? I would suggest we worry about solving the issues related to IPython encoding and executing processes in non-ascii directories to a separate set of changes, targeted at issue 834 perhaps.

Comment #60

Posted on May 1, 2012 by Quick Horse

Hi Jed, I also tested my changes against a chinese account (but in XP not in Win7) so I'm a bit surprised to hear that they didn't run for you. What's the traceback you're getting? and how are you running Spyder, in a dir inside the account or outside of it?

How did you diff the repos? I downloaded yours but I don't exactly know how to compare it with mine. As far as I remember, I just omitted only one of your changes:

http://code.google.com/r/jedludlow-v21-issue812/source/browse/spyderlib/plugins/editor.py#1299

because TEMPFILE_PATH should be encoded correctly as is defined here:

http://code.google.com/r/jedludlow-v21-issue812/source/browse/spyderlib/plugins/editor.py#312

What are the other locations you found I missed?

About the QString correction: I added it because when a traceback is generated in the internal console, you can click on the files that appear on it to open them in the Editor. Without that change I couldn't do that and instead I got a message about the file name not being a string but a QString.

About your last suggestion: In my mind all these changes are related because they're all bugs on non-ascci accounts. But what we could do is first merge the changes that just make Spyder to run in them and then test and merge the other fixes needed to make all plugins to work correctly. How does that sound?

About Issue 834: it was about starting Spyder with 'python bootstrap.py' inside a non-ascii dir, but all Spyder should work ok installed/started in an ascii dir. That's the expected behavior because site-packages can't live in a non-ascii dir.

Comment #61

Posted on May 1, 2012 by Helpful Kangaroo

Kdiff3 will allow you to compare two source trees in two directories and show you every difference between the two trees.

When I did initial testing this morning I wasn't receiving any traceback. Instead the launch was hanging at the splash screen during editor load. After trying a few more times tonight using the python -v flag and the spyder --debug options everything launches fine with these debugging flags. Without them it still hangs on launch so it looks like some subtle timing issue during startup. I never experienced this with my clones in all my earlier testing, so I don't know what's happening there.

I'm simply suggesting that the number of change sets currently on the table is getting large enough that reviewing the whole batch together is getting to be a challenge. I'd be in favor of committing to the main repo soon with the potential to get more testing that way. I think we've established that you can at least launch now with non-ascii usernames. That is the essence of this particular issue.

Comment #62

Posted on May 1, 2012 by Helpful Kangaroo

Carlos, I ran a few more tests in a more careful fashion this morning, and I'm more comfortable that we are okay to push the changes.

The hanging at launch I described in the second paragraph of comment 61 only occurs when I seed encoding.py with some print statements. And, thankfully, it actually happens with both your clone and my clone provided I put the same print statements in the same places. Aside from being annoying at least it's repeatable. I don't think it's related to the unicode changes we are proposing here and is more likely related to socket communication timing during startup. Once I remove the print statements everything launches and runs just fine.

Comment #63

Posted on May 2, 2012 by Quick Horse

Ok, really great to know. Then I'm going to merge this in as is and open a new issue for the IPython problems I mentioned in comment 57.

Comment #64

Posted on May 2, 2012 by Quick Horse

This issue was updated by revision 58a922777152.

-. getRootModules was used to generate the modules database at startup but I confirmed this was not working as expected: the database was created the first time an import statement was written. -. This was also causing problems with non-ascii user accounts on Windows -. This partially reverts revision fc5092657bc3

Comment #65

Posted on May 2, 2012 by Quick Horse

This issue was updated by revision 8e43fb421809.

All major problems should be fixed now, with the exception of some minor ones with IPython.

Comment #66

Posted on May 2, 2012 by Quick Horse

This issue was updated by revision 58a922777152.

-. getRootModules was used to generate the modules database at startup but I confirmed this was not working as expected: the database was created the first time an import statement was written. -. This was also causing problems with non-ascii user accounts on Windows -. This partially reverts revision fc5092657bc3

Comment #67

Posted on May 2, 2012 by Quick Horse

This issue was updated by revision 3bed38623049.

All major problems should be fixed now, with the exception of some minor ones with IPython.

Comment #68

Posted on May 14, 2012 by Massive Elephant

good to know this issue is fixed, thanks for your efforts :)

Looking forward to a new download, since Hg clone doesn't work on my computer ...

Comment #69

Posted on Feb 17, 2015 by Quick Horse

This issue was migrated to https://github.com/spyder-ide/spyder/issues/

The issue number is exactly the same

Status: Fixed

Labels:
Type-Defect Priority-Critical Cat-SpyderGUI MS-v2.1 Restrict-AddIssueComment-Commit