| Issue 247: | pyobc 3.0.6-beta01 + osx 64 bit + freetds 0.91 returns blank string for multibyte unicode | |
| 2 people starred this issue and may be notified of changes. | Back to list |
Still can't really use Freetds 0.91 with pyodbc + OSX. The latest pyodbc also doesn't work anymore with FreeTDS 0.82, but I know we'd like to get off that someday anyway so we'll skip that for now.
With 0.91, I at least can pass a u'' string as a bound value to execute without getting "Invalid data type". However, if the value contains non-ascii characters, now we get a blank u'' string back.
# coding: utf-8
import pyodbc
print pyodbc.version
unicodedata = u"Alors vous imaginez ma surprise, au lever du jour, "\
u"quand une drôle de petite voix m’a réveillé. Elle "\
u"disait: « S’il vous plaît… dessine-moi un mouton! »"
conn = pyodbc.connect(dsn="ms_2005", user="scott", password="tiger")
cursor = conn.cursor()
cursor.execute("""
create table uni_round (
data nvarchar(500)
)
""")
cursor.execute("""
insert into uni_round (data) values (?)
""", (unicodedata.encode('utf-8'),))
cursor.execute("select data from uni_round")
result = cursor.fetchone()[0]
# here, result is u''
assert result == unicodedata, result
classics-MacBook-Pro:sqlalchemy classic$ python test.py
3.0.6-beta01
Traceback (most recent call last):
File "test.py", line 26, in <module>
assert result == unicodedata, result
AssertionError
The freetds.conf file of course has "client charset = UTF-8" as always.
Sep 27, 2012
The freetdstests.py unit tests pass using the following:
* OS/X 10.8 (Mountain Lion)
* SQL Server 2012 Express on Windows 7
* Default Apple Python
* FreeTDS 0.91, compiled from source
* pyodbc 3.0.7-beta08
I don't believe there are any changes since 3.0.6 that would have fixed anything related.
I also added the following test and it passed:
def test_unicode2(self):
"""
From Google Code Issue 247. (Replaced the smart quotes and elipsis)
"""
value = u"""Alors vous imaginez ma surprise, au lever du jour,
quand une drôle de petite voix m'a réveillé. Elle
disait: « S'il vous plaît... dessine-moi un mouton! »"""
self.cursor.execute("create table t1(s nvarchar(500))")
self.cursor.execute("insert into t1 values(?)", value)
v = self.cursor.execute("select * from t1").fetchone()[0]
self.assertEqual(type(v), unicode)
self.assertEqual(v, value)
Are you still having problems?
Status:
Hold
Labels: FreeTDS
Sep 29, 2012
(No comment was entered for this change.)
Labels:
-FreeTDS Driver-freetds
Sep 29, 2012
thanks. I'll have to get the time to install 0.91 again and get everything going, but if you are not seeing the issue on your end, that's encouraging. is your test using "nvarchar" as the type for the column ?
Apr 2, 2013
still having issues, I get back a string, but the encoding is wrong:
- Python 2.7.3 built from source, as well as Python 3.3.0 built from source
- OSX mountain lion
- FreeTDS 0.91
- Pyodbc 3.0.7-beta10
- Freetds.conf has:
[ms_2005]
host = 172.16.248.128
port = 1213
tds version = 8.0
client charset = UTF8
text size = 50000000
Looking at PDB this is what I'm currently seeing for 2.7 (the assertion doesn't print anything for some reason):
(Pdb) !result
u'Alors vous imaginez ma surprise, au lever du jour, quand une dr\xc3\xb4le de petite voix m\xe2\x80\x99a r\xc3\xa9veill\xc3\xa9. Elle disait: \xc2\xab S\xe2\x80\x99il vous pla\xc3\xaet\xe2\x80\xa6 dessine-moi un mouton! \xc2\xbb'
(Pdb) !unicodedata
u'Alors vous imaginez ma surprise, au lever du jour, quand une dr\xf4le de petite voix m\u2019a r\xe9veill\xe9. Elle disait: \xab S\u2019il vous pla\xeet\u2026 dessine-moi un mouton! \xbb'
I get a similar result for 3.3 (the assertion error prints):
AssertionError: Alors vous imaginez ma surprise, au lever du jour, quand une dr\xc3\xb4le de petite voix m\xe2\x80\x99a r\xc3\xa9veill\xc3\xa9. Elle disait: \xc2\xab S\xe2\x80\x99il vous pla\xc3\xaet\xe2\x80\xa6 dessine-moi un mouton! \xc2\xbb
!=
Alors vous imaginez ma surprise, au lever du jour, quand une dr\xf4le de petite voix m\u2019a r\xe9veill\xe9. Elle disait: \xab S\u2019il vous pla\xeet\u2026 dessine-moi un mouton! \xbb
Apr 5, 2013
yeah I'm trying every flag there is, here's some other detail: - the Python builds are 64 bit - I'm using iODBC, not unixodbc, version 3.52.7 the value coming back from FreeTDS is clearly already utf-8 encoded. If I try to force "UCS2" or "UCS4" in the freetds.conf file, the whole program just crashes: Assertion failed: (0), function tds7_send_login, file login.c, line 905. Abort trap: 6 if you leave client encoding out, then freetds defaults to iso-8859-1, and as expected I get an encoded iso-8859-1 string inside the u'' instead of a utf-8.
Apr 5, 2013
just tried the built-in Apple Python, getting the same result.
Apr 5, 2013
OK researching my iodbc setup, I think I have 3.52.6 and 3.52.7 both installed, will try to reconcile which is in use.
Apr 5, 2013
3.52.6
Apr 5, 2013
I'm just beginning to understand the source here, and I believe you've mentioned earlier, pyodbc assumes that data being returned is in UCS-2 format. And interestingly, when I run this script on a Fedora platform with unixodbc and freetds 0.91, I get the correct result. Looking in the source, I don't see pyodbc doing anything at all with encodings - it is moving the data straight from what SQLGetData() gives it into a Python Unicode object, though I don't yet understand the buffering logic going on. The strange thing here is that, per FreeTDS's documentation here: http://freetds.schemamania.org/userguide/localization.htm, this shouldn't work at all - you will always be getting the data either as UTF-8, or ISO-88590-1 (the default), unless you set UCS-2 in freetds.conf. Which does not work either on OSX or on Linux, you get a core dump. Admitting that I'm still totally in the dark here, it seems like FreeTDS + UnixODBC on linux is not actually honoring "client encoding" whereas FreeTDS + iODBC on OSX is, hence on OSX I get UTF-8 shoved into a u'' string.
Apr 6, 2013
also supporting this, if I use an inadequate encoding, like WINDOWS-1251, on OSX I get: u'dr?le m\x92a r?veill?', on Linux I still get the full string - "client charset" is somehow having no effect on linux (unless I change it to a "broken" encoding, like UCS-2 or UTF-16 - then it core dumps).
Apr 6, 2013
OK I've now tested this Pyodbc against the following test:
# coding: utf-8
import imp
pyodbc = imp.load_dynamic("pyodbc", "build/lib.macosx-10.4-x86_64-2.7/pyodbc.so")
unicodedata = u"drôle m’a réveillé."
conn = pyodbc.connect(u"DSN=ms_2005;UID=scott;PWD=tiger")
cursor = conn.cursor()
cursor.execute("select ?", (unicodedata, ))
result = cursor.fetchone()[0]
print "original data: %r" % unicodedata
print "received from pyodbc: %r" % result
All on OSX, FreeTDS 0.91:
Result on iODBC 3.52.6:
classics-MacBook-Pro:pyodbc classic$ python test.py
original data: u'dr\xf4le m\u2019a r\xe9veill\xe9.'
received from pyodbc: u'dr\xc3\xb4le m\xe2\x80\x99a r\xc3\xa9veill\xc3\xa9.'
Result on iODBC 3.52.7, 3.52.8 on master (these are via various tags at https://github.com/openlink/iODBC/tree/develop/iodbc), as well as unixODBC 2.3.1 (for each build, I tested pyodbc.so with otool -L to ensure it built to the correct library):
original data: u'dr\xf4le m\u2019a r\xe9veill\xe9.'
received from pyodbc: u''
What's going on in all those others is that the driver isn't handling the u'' string at all, if I change it to u'hi' I get this:
classics-MacBook-Pro:pyodbc classic$ python test.py
original data: u'hi'
received from pyodbc: u'\ufffd\x00'
What freetds.log shows in all the non-working cases that isn't in the 3.52.6 log is this, right before it attempts to send the statement along with the bound parameter:
17:54:26.627963 34615 (util.c:331):tdserror(0x1003a3480, 0x1003c37f0, 2402, 0)
17:54:26.627968 34615 (odbc.c:2270):msgno 2402 20003
17:54:26.627973 34615 (util.c:361):tdserror: client library returned TDS_INT_CANCEL(2)
17:54:26.627978 34615 (util.c:384):tdserror: returning TDS_INT_CANCEL(2)
This test seems to illustrate an issue at least with sending the string, and possibly receiving it as well.
Apr 6, 2013
Running the tests2/freetdstests.py causes a core dump for me if I keep the encoding on UTF-8 in freetds.conf, one of the tests is doing something it doesn't like. For the test_unicode2 you have above, it fails:
classics-MacBook-Pro:pyodbc classic$ python tests2/freetdstests.py "DSN=ms_2005;UID=scott;PWD=tiger" -t test_unicode2
python: 2.7.3 (default, Feb 14 2013, 14:25:59)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00)]
pyodbc: 3.0.7-beta10 /usr/local/src/pyodbc/build/lib.macosx-10.4-x86_64-2.7/pyodbc.so
odbc: 03.52.0000
driver: libtdsodbc.so 0.91
supports ODBC version 03.50
os: Darwin
unicode: Py_Unicode=2 SQLWCHAR=4
======================================================================
FAIL: test_unicode2 (__main__.FreeTDSTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "tests2/freetdstests.py", line 1166, in test_unicode2
self.assertEqual(v, value)
AssertionError: u'' != u"Alors vous imaginez ma surprise, au lever du jour,\n quand [truncated]...
+ Alors vous imaginez ma surprise, au lever du jour,
+ quand une dr\xf4le de petite voix m'a r\xe9veill\xe9. Elle
+ disait: \xab S'il vous pla\xeet... dessine-moi un mouton! \xbb
----------------------------------------------------------------------
Ran 1 test in 0.021s
FAILED (failures=1)
Aug 5, 2013
here's one way I *can* make it work:
1. use tds version =8.0 , not 7.0
2. cast the data to non-unicode first (and include a length, for some reason), you can get it back as bytes:
cursor.execute("select cast(data as varchar(200)) from uni_round")
result = cursor.fetchone()[0]
assert result.decode('utf-8') == unicodedata, result
|
Sorry, that .encode() wasn't intended, though the result is the same. Take out the encode(), same result: # coding: utf-8 import pyodbc print pyodbc.version unicodedata = u"Alors vous imaginez ma surprise, au lever du jour, "\ u"quand une drôle de petite voix m’a réveillé. Elle "\ u"disait: « S’il vous plaît… dessine-moi un mouton! »" conn = pyodbc.connect(dsn="ms_2005", user="scott", password="tiger") cursor = conn.cursor() cursor.execute(""" create table uni_round ( data nvarchar(500) ) """) cursor.execute(""" insert into uni_round (data) values (?) """, (unicodedata,)) cursor.execute("select data from uni_round") result = cursor.fetchone()[0] assert result == unicodedata, result