Export to GitHub

simplejson - issue #40

Decoder returns python str (not unicode) for JSON string (new in 2.0.7)


Posted on Feb 12, 2009 by Grumpy Ox

Expected (and found in 2.0.6): >>> type(simplejson.loads('"foo"')) <type 'unicode'> >>> type(simplejson.loads(u'"foo"')) <type 'unicode'> >>> type(simplejson.loads(simplejson.dumps(u'foo'))) <type 'unicode'> >>> type(simplejson.loads(simplejson.dumps(u'\xfffoo'))) <type 'unicode'>

Actual (2.0.7): >>> type(simplejson.loads('"foo"')) <type 'str'> >>> type(simplejson.loads(u'"foo"')) <type 'unicode'> >>> type(simplejson.loads(simplejson.dumps(u'foo'))) <type 'str'> >>> type(simplejson.loads(simplejson.dumps(u'\xfffoo'))) <type 'unicode'>

since JSON output is encoded unicode, the parsed version should be a unicode object.

Comment #1

Posted on Feb 12, 2009 by Happy Rabbit

This is an optimization. If given a str object as input, then it will give str strings as output if and only if the string is ASCII-only. ASCII-only strings are interchangable with unicode. If you give it unicode input then you'll get unicode output strings regardless. This optimization is not new in 2.0.7.

simplejson.loads('"foo"') 'foo' simplejson.loads(u'"foo"') u'foo'

dumps always returns an ASCII-only string by default, so that's why loads(dumps(unistr)) can give you ASCII strings. You'd want to do loads(unicode(dumps(unistr))) if you want to get unicode strings back out.

Comment #2

Posted on May 19, 2009 by Grumpy Camel

Bob, I know you've now refused to fix this in several situations now (such as: http://www.nabble.com/simplejson-2.0.0-released,-much-faster.-td19705153.html), and I can actually name you a place where I think it causes issues.

In Sqlalchemy, the "Unicode" type (http://www.sqlalchemy.org/docs/05/reference/sqlalchemy/types.html#sqlalchemy.types.Unicode), warns when you insert str() objects.

My work flow: create some complicated thing, serialize it to json, which gets used by many other different workflow processes. When I read it back in, I'd really like every string in the thing to come back in as unicode type, if possible.

Thanks!

Comment #3

Posted on May 19, 2009 by Grumpy Camel

Oh, I see that in issue 28, someone mentioned this exact issue, and you bdfl'd it there too! I guess I'll deal with it on my own then!

Comment #4

Posted on May 19, 2009 by Happy Rabbit

If you want unicode strings, use a unicode input document.

Comment #5

Posted on May 19, 2009 by Happy Rhino

I have personally wasted hours on this. I can't afford to track down subtle bugs that depend on what version of simplejson someone has installed and whether the speedups are present, so nowadays I only use it through the following wrapper module.

Eliminating the need for this wrapper is one of the benefits I have hoped to reap by dropping support for Python 2.5 someday. I just hope the issue doesn't recur in Python 2.x's built-in json module.

try: import json # Python 2.6 except ImportError: import simplejson as json # Python 2.5

dumps = json.dumps

def loads(s, *args, **kwargs): # When its argument is of type str, loads() decodes strings as # either str or unicode depending on whether simplejson's speedups # are installed (at least this is true in simplejson 2.0.7). It # always decodes strings as unicode when the argument to loads() # is of type unicode. return json.loads(unicode(s), *args, **kwargs)

Comment #6

Posted on May 19, 2009 by Happy Rabbit

It is the same in Python 2.7 trunk. If you want unicode even for ASCII strings, use unicode input.

Comment #7

Posted on May 24, 2009 by Swift Cat

This cost me several hours as well. Decoding external input into unicode seems like something that should happen at a program's data boundaries - which is where I suspect the simplejson/json module is frequently used. As such, the principle of least astonishment suggests to me that I should be getting unicode back. I don't know about other users, but the speed optimization isn't that valuable to me at the moment - maybe some kind of 'output_ascii' keyword, for people who need the speed enhancement, for loads would be a better solution?

Comment #8

Posted on Mar 28, 2012 by Swift Dog

Believe it or not, some applications still require ascii and don't play well with unicode. For an application I have to work with every day, this is a feature, not a bug. I'm voting in order to be notified if this ever gets "fixed"...

Comment #9

Posted on Mar 28, 2012 by Swift Cat

The issue tracker for simplejson is here: https://github.com/simplejson/simplejson/issues

Comment #10

Posted on Apr 21, 2013 by Swift Monkey

This is crazy - a full day of 2 developers down the drain!

import simplejson as json dump = json.dumps((u"$123", u"₪123")) [type(object) for object in json.loads(dump)] [, ] # This is bad!

vs.

import json dump = json.dumps((u"$123", u"₪123")) [type(object) for object in json.loads(dump)] [, ] # This is good!

Comment #11

Posted on Oct 18, 2013 by Happy Elephant

The pure python version of simplejson gives different type than the c speedups version. I ran into this when installing in virtual env without python-dev. You can demo the problem on the version installed with speedups by using _toggle_speedups to go back to pure version.

import simplejson as json json.loads('"foo"') 'foo' json._toggle_speedups(False) json.loads('"foo"') u'foo'

This needs to be fixed one way or the other.

Comment #12

Posted on Sep 8, 2014 by Swift Camel

Hm, for me, both libraries do it 'wrong'-ish: json returns even for "$123", withOUT the 'u' that renders it unicode. simplejson returns when the input is u"$123"? What's the reason for this inconsistency?

Status: WontFix

Labels:
Type-Defect Priority-Medium