| Issue 25: | inconsistent comparison of unicode/non-unicode strings with MySQL / ODBC | |
| 1 person starred this issue and may be notified of changes. | Back to list |
using MySQL with pyodbc, result values from table selects appear to come
back as Python unicode objects (though oddly, not from literal string
selects). But then it also will fail to compare a unicode value to a
non-unicode value on the server, leading to this contradictory behavior:
import pyodbc
c= pyodbc.connect("dsn=mysql_test;TrustedConnection=Yes")
cursor = c.cursor()
# passes
cursor.execute("drop table foo")
cursor.execute("create table foo(data varchar(30))")
cursor.execute("insert into foo values (?)", u'test')
cursor.execute("select * from foo")
x = cursor.fetchone()[0]
cursor.execute("select count(1) from foo where data=?", x)
assert cursor.fetchone()[0] == 1
# fails
cursor.execute("drop table foo")
cursor.execute("create table foo(data varchar(30))")
cursor.execute("insert into foo values (?)", 'test')
cursor.execute("select * from foo")
x = cursor.fetchone()[0]
cursor.execute("select count(1) from foo where data=?", x)
assert cursor.fetchone()[0] == 1
Where in the second test, the value of 'test' in the database does not
compare to that of u'test'. not sure if theres an ODBC setting I should be
using here.
Jan 15, 2009
I'll have to install MySQL this weekend and try it. It definitely does not behave that way on MSSQL; the returned varchar values would be str, not Unicode. In fact, that is the first thing I would look at. pyodbc doesn't convert ANSI to Unicode on its own, so the database driver must be returning them as Unicode. I believe the MySQL driver has some keywords to control this. In general, pyodbc never converts parameters -- it passes what you give directly to the driver and tells the driver whether the value is ANSI or Unicode. *However*, on OS/X, the Python Unicode data type is 4 bytes while wchar_t is only 2 bytes, so there is some conversion going on. If I remember correctly, I call a Python API to perform the conversion, perhaps it is incorrect. To find out for sure, generate a small ODBC trace for the test above and post it. That will tell me everything I need to know.
Status:
Investigating
Jan 16, 2009
sure thing....
Jan 16, 2009
Thanks for the trace. Here's what I see: The first execute definitely binds the parameter as a Unicode string, which we can see by looking at the SQLBindParameter call and noting the data type: SQL_C_WCHAR. (For those that don't know, 'WCHAR' stands for 'wide character', meaning Unicode.) The second execute binds using SQL_C_CHAR, so it is ANSI. These are as I would expect. What is unexpected is the fetch for both returns ANSI strings, even though the first execute should simply return the Unicode parameter. One of your previous emails seemed to say it always returns Unicode, but it looks like it should always return a string instead. Can you confirm? (Perhaps print cursor.description and show the results of the print cursor.fetchone()) Perhaps we shouldn't be surprised, however, since this is a strange situation. At the point that the query is compiled, there is no way for the MySQL driver to know the datatype of the parameter you are going to pass in the future. A better test might be to create a table with a Unicode column and ANSI column and print a row from that. Unfortunately, both of these issues may be different than the one you opened the issue for, and that is the parameter in a where clause. That issue seems to be database/driver dependent. The trace you've provided confirms that pyodbc passes the parameters as you provide them: unicode objects are passed as WCHAR and string objects are passed as CHAR. My suggestion would be to always pass the datatype that matches the target column. How is it that they do not match? Are you using ANSI MySQL tables but Unicode in your Python program?
Jan 16, 2009
the issue is, if I use a regular old MySQL table, put a regular old string in it, I
get back a Python unicode object in the result set. Using that result as a bind
parameter then fails since unicodes seem to get blown away on the way back in.
Heres a test/trace:
import pyodbc
c= pyodbc.connect("dsn=mysql_test;TrustedConnection=Yes")
cursor = c.cursor()
cursor.execute("drop table foo")
cursor.execute("create table foo(data varchar(30))")
cursor.execute("insert into foo values (?)", 'test')
cursor.execute("select * from foo")
result = cursor.fetchone()[0]
assert isinstance(result, unicode)
cursor.execute("select ?", result)
r2 = cursor.fetchone()[0]
assert result == r2
Jan 23, 2009
Your trace confirms that this is out of my control. The data *really* does come back as Unicode data: [000000.031062] Python A0313720 EXIT SQLDescribeCol with return code 0 (SQL_SUCCESS) ... SQLSMALLINT * 0xbfffedea (SQL_WVARCHAR) There really isn't anything I can do in pyodbc -- there must be some kind of server or driver setting for this. What version of MySQL are you running and what driver are you using? You can determine the driver using: cnxn.getinfo(pyodbc.SQL_DRIVER_NAME) cnxn.getinfo(pyodbc.SQL_DRIVER_VER) I'm going to leave this issue open for a bit while we investigate, but I don't think there are any changes I can make -- pyodbc takes the data as it is sent from the database/driver.
Mar 18, 2009
I'm going to have to close this since there really isn't anything I can do. If you found any more information, feel free to post here or at the pyodbc news group since it may help others. Sorry I couldn't do more.
Status:
WontFix
|
on further investigation, the problem is not the comparison. It's just unicode objects as bind parameters that are busted: import pyodbc c= pyodbc.connect("dsn=mysql_test;TrustedConnection=Yes") cursor = c.cursor() cursor.execute("select ?", u'test') print cursor.fetchone() cursor.execute("select ?", 'test') print cursor.fetchone() It's just a little rough since result sets return strings as unicode objects, so round trips have to take an asymmetric approach to bind/result handling. Michael Trier tells me pyodbc with MSSQL doesn't have this issue. I'm on OSX if that matters.