Issue 247: pyobc 3.0.6-beta01 + osx 64 bit + freetds 0.91 returns blank string for multibyte unicode
Status:  Hold
Owner: ----
Reported by zzz...@gmail.com, Mar 13, 2012
Still can't really use Freetds 0.91 with pyodbc + OSX.   The latest pyodbc also doesn't work anymore with FreeTDS 0.82, but I know we'd like to get off that someday anyway so we'll skip that for now.

With 0.91, I at least can pass a u'' string as a bound value to execute without getting "Invalid data type".   However, if the value contains non-ascii characters, now we get a blank u'' string back.

# coding: utf-8

import pyodbc
print pyodbc.version

unicodedata = u"Alors vous imaginez ma surprise, au lever du jour, "\
            u"quand une drôle de petite voix m’a réveillé. Elle "\
            u"disait: « S’il vous plaît… dessine-moi un mouton! »"

conn = pyodbc.connect(dsn="ms_2005", user="scott", password="tiger")

cursor = conn.cursor()

cursor.execute("""
create table uni_round (
    data nvarchar(500)
)
""")

cursor.execute("""
    insert into uni_round (data) values (?)
""", (unicodedata.encode('utf-8'),))

cursor.execute("select data from uni_round")
result = cursor.fetchone()[0]
# here, result is u''
assert result == unicodedata, result


classics-MacBook-Pro:sqlalchemy classic$ python test.py
3.0.6-beta01
Traceback (most recent call last):
  File "test.py", line 26, in <module>
    assert result == unicodedata, result
AssertionError


The freetds.conf file of course has "client charset = UTF-8" as always.


Mar 13, 2012
#1 zzz...@gmail.com
Sorry, that .encode() wasn't intended, though the result is the same.  Take out the encode(), same result:

# coding: utf-8

import pyodbc
print pyodbc.version

unicodedata = u"Alors vous imaginez ma surprise, au lever du jour, "\
            u"quand une drôle de petite voix m’a réveillé. Elle "\
            u"disait: « S’il vous plaît… dessine-moi un mouton! »"

conn = pyodbc.connect(dsn="ms_2005", user="scott", password="tiger")

cursor = conn.cursor()

cursor.execute("""
create table uni_round (
    data nvarchar(500)
)
""")

cursor.execute("""
    insert into uni_round (data) values (?)
""", (unicodedata,))

cursor.execute("select data from uni_round")
result = cursor.fetchone()[0]
assert result == unicodedata, result



Sep 27, 2012
Project Member #2 mkleehammer
The freetdstests.py unit tests pass using the following:

* OS/X 10.8 (Mountain Lion)
* SQL Server 2012 Express on Windows 7
* Default Apple Python
* FreeTDS 0.91, compiled from source
* pyodbc 3.0.7-beta08

I don't believe there are any changes since 3.0.6 that would have fixed anything related.

I also added the following test and it passed:

    def test_unicode2(self):
        """
        From Google Code Issue 247.  (Replaced the smart quotes and elipsis)
        """
        value = u"""Alors vous imaginez ma surprise, au lever du jour,
                    quand une drôle de petite voix m'a réveillé. Elle
                    disait: « S'il vous plaît... dessine-moi un mouton! »"""
        self.cursor.execute("create table t1(s nvarchar(500))")
        self.cursor.execute("insert into t1 values(?)", value)
        v = self.cursor.execute("select * from t1").fetchone()[0]
        self.assertEqual(type(v), unicode)
        self.assertEqual(v, value)

Are you still having problems?


Status: Hold
Labels: FreeTDS
Sep 29, 2012
Project Member #3 mkleehammer
(No comment was entered for this change.)
Labels: -FreeTDS Driver-freetds
Sep 29, 2012
#4 zzz...@gmail.com
thanks.  I'll have to get the time to install 0.91 again and get everything going, but if you are not seeing the issue on your end, that's encouraging.   is your test using "nvarchar" as the type for the column ?
Apr 2, 2013
#5 zzz...@gmail.com
still having issues, I get back a string, but the encoding is wrong:

- Python 2.7.3  built from source, as well as Python 3.3.0 built from source
- OSX mountain lion
- FreeTDS 0.91
- Pyodbc 3.0.7-beta10
- Freetds.conf has:

        [ms_2005]
        host = 172.16.248.128
        port = 1213
        tds version = 8.0
        client charset = UTF8
        text size = 50000000

Looking at PDB this is what I'm currently seeing for 2.7 (the assertion doesn't print anything for some reason):

(Pdb) !result
u'Alors vous imaginez ma surprise, au lever du jour, quand une dr\xc3\xb4le de petite voix m\xe2\x80\x99a r\xc3\xa9veill\xc3\xa9. Elle disait: \xc2\xab S\xe2\x80\x99il vous pla\xc3\xaet\xe2\x80\xa6 dessine-moi un mouton! \xc2\xbb'
(Pdb) !unicodedata
u'Alors vous imaginez ma surprise, au lever du jour, quand une dr\xf4le de petite voix m\u2019a r\xe9veill\xe9. Elle disait: \xab S\u2019il vous pla\xeet\u2026 dessine-moi un mouton! \xbb'


I get a similar result for 3.3 (the assertion error prints):

AssertionError: Alors vous imaginez ma surprise, au lever du jour, quand une dr\xc3\xb4le de petite voix m\xe2\x80\x99a r\xc3\xa9veill\xc3\xa9. Elle disait: \xc2\xab S\xe2\x80\x99il vous pla\xc3\xaet\xe2\x80\xa6 dessine-moi un mouton! \xc2\xbb 

!= 

Alors vous imaginez ma surprise, au lever du jour, quand une dr\xf4le de petite voix m\u2019a r\xe9veill\xe9. Elle disait: \xab S\u2019il vous pla\xeet\u2026 dessine-moi un mouton! \xbb
Apr 5, 2013
#6 zzz...@gmail.com
yeah I'm trying every flag there is, here's some other detail:

- the Python builds are 64 bit
- I'm using iODBC, not unixodbc, version 3.52.7


the value coming back from FreeTDS is clearly already utf-8 encoded.  If I try to force "UCS2" or "UCS4" in the freetds.conf file, the whole program just crashes:

Assertion failed: (0), function tds7_send_login, file login.c, line 905.
Abort trap: 6

if you leave client encoding out, then freetds defaults to iso-8859-1, and as expected I get an encoded iso-8859-1 string inside the u'' instead of a utf-8.
Apr 5, 2013
#7 zzz...@gmail.com
just tried the built-in Apple Python, getting the same result.
Apr 5, 2013
#8 zzz...@gmail.com
OK researching my iodbc setup, I think I have 3.52.6 and 3.52.7 both installed, will try to reconcile which is in use.
Apr 5, 2013
#9 zzz...@gmail.com
3.52.6
Apr 5, 2013
#10 zzz...@gmail.com
I'm just beginning to understand the source here, and I believe you've mentioned earlier, pyodbc assumes that data being returned is in UCS-2 format.  And interestingly, when I run this script on a Fedora platform with unixodbc and freetds 0.91, I get the correct result.  Looking in the source, I don't see pyodbc doing anything at all with encodings - it is moving the data straight from what SQLGetData() gives it into a Python Unicode object, though I don't yet understand the buffering logic going on.

The strange thing here is that, per FreeTDS's documentation here: http://freetds.schemamania.org/userguide/localization.htm, this shouldn't work at all - you will always be getting the data either as UTF-8, or ISO-88590-1 (the default), unless you set UCS-2 in freetds.conf.  Which does not work either on OSX or on Linux, you get a core dump.

Admitting that I'm still totally in the dark here, it seems like FreeTDS + UnixODBC on linux is not actually honoring "client encoding" whereas FreeTDS + iODBC on OSX is, hence on OSX I get UTF-8 shoved into a u'' string.
Apr 6, 2013
#11 zzz...@gmail.com
also supporting this, if I use an inadequate encoding, like WINDOWS-1251, on OSX I get: u'dr?le m\x92a r?veill?', on Linux I still get the full string - "client charset" is somehow having no effect on linux (unless I change it to a "broken" encoding, like UCS-2 or UTF-16 - then it core dumps).
Apr 6, 2013
#12 zzz...@gmail.com
OK I've now tested this Pyodbc against the following test:

# coding: utf-8

import imp
pyodbc = imp.load_dynamic("pyodbc", "build/lib.macosx-10.4-x86_64-2.7/pyodbc.so")

unicodedata = u"drôle m’a réveillé."

conn = pyodbc.connect(u"DSN=ms_2005;UID=scott;PWD=tiger")

cursor = conn.cursor()

cursor.execute("select ?", (unicodedata, ))
result = cursor.fetchone()[0]
print "original data:        %r" % unicodedata
print "received from pyodbc: %r" % result

All on OSX, FreeTDS 0.91:

Result on iODBC 3.52.6:

classics-MacBook-Pro:pyodbc classic$ python test.py
original data:        u'dr\xf4le m\u2019a r\xe9veill\xe9.'
received from pyodbc: u'dr\xc3\xb4le m\xe2\x80\x99a r\xc3\xa9veill\xc3\xa9.'

Result on iODBC 3.52.7, 3.52.8 on master (these are via various tags at https://github.com/openlink/iODBC/tree/develop/iodbc), as well as unixODBC 2.3.1 (for each build, I tested pyodbc.so with otool -L to ensure it built to the correct library):

original data:        u'dr\xf4le m\u2019a r\xe9veill\xe9.'
received from pyodbc: u''

What's going on in all those others is that the driver isn't handling the u'' string at all, if I change it to u'hi' I get this:

classics-MacBook-Pro:pyodbc classic$ python test.py
original data:        u'hi'
received from pyodbc: u'\ufffd\x00'


What freetds.log shows in all the non-working cases that isn't in the 3.52.6 log is this, right before it attempts to send the statement along with the bound parameter:


17:54:26.627963 34615 (util.c:331):tdserror(0x1003a3480, 0x1003c37f0, 2402, 0)
17:54:26.627968 34615 (odbc.c:2270):msgno 2402 20003
17:54:26.627973 34615 (util.c:361):tdserror: client library returned TDS_INT_CANCEL(2)
17:54:26.627978 34615 (util.c:384):tdserror: returning TDS_INT_CANCEL(2)


This test seems to illustrate an issue at least with sending the string, and possibly receiving it as well.


Apr 6, 2013
#14 zzz...@gmail.com
Running the tests2/freetdstests.py causes a core dump for me if I keep the encoding on UTF-8 in freetds.conf, one of the tests is doing something it doesn't like.  For the test_unicode2 you have above, it fails:

classics-MacBook-Pro:pyodbc classic$ python tests2/freetdstests.py "DSN=ms_2005;UID=scott;PWD=tiger" -t test_unicode2
python:  2.7.3 (default, Feb 14 2013, 14:25:59) 
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)]
pyodbc:  3.0.7-beta10 /usr/local/src/pyodbc/build/lib.macosx-10.4-x86_64-2.7/pyodbc.so
odbc:    03.52.0000
driver:  libtdsodbc.so 0.91
         supports ODBC version 03.50
os:      Darwin
unicode: Py_Unicode=2 SQLWCHAR=4
======================================================================
FAIL: test_unicode2 (__main__.FreeTDSTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests2/freetdstests.py", line 1166, in test_unicode2
    self.assertEqual(v, value)
AssertionError: u'' != u"Alors vous imaginez ma surprise, au lever du jour,\n                    quand  [truncated]...
+ Alors vous imaginez ma surprise, au lever du jour,
+                     quand une dr\xf4le de petite voix m'a r\xe9veill\xe9. Elle
+                     disait: \xab S'il vous pla\xeet... dessine-moi un mouton! \xbb

----------------------------------------------------------------------
Ran 1 test in 0.021s

FAILED (failures=1)

Aug 5, 2013
#15 zzz...@gmail.com
here's one way I *can* make it work:

1. use tds version =8.0 , not 7.0

2. cast the data to non-unicode first (and include a length, for some reason), you can get it back as bytes:

cursor.execute("select cast(data as varchar(200)) from uni_round")
result = cursor.fetchone()[0]
assert result.decode('utf-8') == unicodedata, result