Export to GitHub

couchdb-python - issue #232

Non-ascii content doesn't work


Posted on Jan 5, 2014 by Happy Lion

I can hardly believe that I'm not doing anything wrong, because that kind of error should be detected by anyone after using library for 2 hours, but whatever...

What steps will reproduce the problem? 1. Save document containing unicode, like '{"a": "ąąąąą"}' 2. Watch couchdb fail.

What is the expected output? What do you see instead?

I see an UnicodeDecodeError error. And that's not surprising, because json.encode returns str, which is later encoded.

Normally this error wouldn't occur, but ensure_ascii=False is passed to json.dumps.

What version of the product are you using? On what operating system?

0.9 from pip on Ubuntu 12.10.

Comment #1

Posted on Jan 6, 2014 by Quick Rhino

What version of Python are you on? How does "couchdb fail"?

Comment #2

Posted on Jan 6, 2014 by Happy Lion

Python 2.7, fails with UnicodeDecodeError in couchdb/http.py. json.dumps with ensure_ascii=False returns unicode or str depending on data it receives (if it contains unicode it returns unicode, if it contains str it returns str). This is generally json.dumps bug, because it itself fails with UnicodeDecodeError if it receives both types of strings.

The following patch fixes a part of a problem (documents mixing str and unicode still won't work, because of json.dumps bug):

diff -r 961ac99baa29 couchdb/http.py --- a/couchdb/http.py Sun Aug 18 18:41:46 2013 +0200 +++ b/couchdb/http.py Mon Jan 06 11:01:20 2014 +0100 @@ -262,7 +262,9 @@

     if (body is not None and not isinstance(body, basestring) and
             not hasattr(body, 'read')):

- body = json.encode(body).encode('utf-8') + body = json.encode(body) + if isinstance(body, unicode): + body = body.encode('utf-8') headers.setdefault('Content-Type', 'application/json')

     if body is None:

Removing ensure_ascii=False from json.dumps would be other, possibly better solution - json.dumps correctly handles documents mixing str/unicode without this option.

Comment #3

Posted on Jan 6, 2014 by Quick Rhino

I think the problem here is with stdlib json vs simplejson. Any patch you do should work with both.

Comment #4

Posted on Jul 6, 2014 by Quick Rhino

Given the similarity to #235 and the changes made recently to support Python 3, I'm going to assume this has been fixed on the current default branch. Feel free to reopen if you can still reproduce this issue.

Status: WorksForMe

Labels:
Type-Defect Priority-Medium