| Issue 48: | dumps with ensure_ascii=False fails with mix of unicode and non-unicode | |
| 3 people starred this issue and may be notified of changes. | Back to list |
What steps will reproduce the problem?
>>> s = {'foo': u'bar', 'quux': 'Arr\xc3\xaat sur images'}
>>> simplejson.dumps(s)
'{"quux": "Arr\\u00eat sur images", "foo": "bar"}'
>>> simplejson.dumps(s, ensure_ascii=False)
Traceback (most recent call last):
...
File ".../lib/python2.6/json/encoder.py", line 368, in encode
return ''.join(chunks)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
What is the expected output?
u'{"quux": "Arr\xc3\xaat sur images", "foo": "bar"}'
What version of the product are you using?
2.0.9
On what operating system?
Mac OS X
|
|
,
Apr 14, 2009
sorry, the expected out should have been:
u'{"quux": "Arr\u00eaat sur images", "foo": "bar"}'
|
|
,
Apr 14, 2009
or equivalently, u'{"quux": "Arr\xeaat sur images", "foo": "bar"}'
|
|
,
Apr 14, 2009
not sure about the choice of name for the ensure_ascii parameter as json is always encoded in utf-something according to http://tools.ietf.org/html/rfc4627. assuming the parameter is meant to ensure we get back out a unicode value, JSONEncoder.encode should be checking for it in the non-basestring case (see http://code.google.com/p/simplejson/source/browse/trunk/simplejson/encoder.py?r=174#181). Here's a failing test demonstrating one facet of the problem: Index: tests/test_unicode.py =================================================================== --- tests/test_unicode.py (revision 183) +++ tests/test_unicode.py (working copy) @@ -78,4 +78,7 @@ def test_unicode_preservation(self): self.assertEquals(type(json.loads(u'""')), unicode) self.assertEquals(type(json.loads(u'"a"')), unicode) + + def test_empty_list(self): + self.assertEquals(type(json.dumps([], ensure_ascii=False)), unicode) ====================================================================== FAIL: test_empty_list (simplejson.tests.test_unicode.TestUnicode) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/local/tmp/json/src/simplejson/simplejson/tests/test_unicode.py", line 84, in test_empty_list self.assertEquals(type(json.dumps([], ensure_ascii=False)), unicode) AssertionError: <type 'str'> != <type 'unicode'> ---------------------------------------------------------------------- |
|
,
Apr 14, 2009
ensure_ascii is simply an incoherent parameter for a json encoder. The JSON RFC specifies that JSON is always encoded in some form of Unicode. What you might want is a parameter that chooses between (1) returning a Python unicode object, which clients can later encode, or (2) returning a UTF-8 encoded Python str, or (3) returning a Python str in some other Unicode encoding. I would suggest keeping the current encoding parameter to choose among #2 and #3, deleting the ensure_ascii parameter and raising an error message containing the text of this comment (or some other explanation of why it was removed), and adding a new return_unicode parameter, defaulting to the opposite of whatever ensure_ascii defaults to. If return_unicode is True, and there is an encoding parameter, then raise an exception. All of the intermediate work of encoding should probably be done in unicode -- that is, when a Python str is encoded, it should be (first) decoded from ascii (that is, call unicode(x)). Also, make sure that all string literals are u'string literals' (see Josh Bronson's comment, above). |
|
,
Apr 14, 2009
re commend #4 *READ THE DOCS* --- you are thoroughly confused as to what these parameters do. |
|
,
Apr 14, 2009
The docstring says:
If ``ensure_ascii`` is false, then the return value will be a
``unicode`` instance subject to normal Python ``str`` to ``unicode``
coercion rules instead of being escaped to an ASCII ``str``.
This does not describe what the code actually does -- ensure_ascii=False doesn't
necessarily return unicode. It does appear on reflection that ensure_ascii=True does
something permissible, but it is somewhat bizarre.
Oh, I was wrong about what encoding does. Sorry about that.
|
|
,
Apr 14, 2009
r184
Status: Fixed
|
|
|
|