My favorites | Sign in
Project Home Downloads Wiki
READ-ONLY: This project has been archived. For more information see this post.
Search
for
  Advanced search   Search tips   Subscriptions
Issue 25: inconsistent comparison of unicode/non-unicode strings with MySQL / ODBC
1 person starred this issue and may be notified of changes. Back to list
Status:  WontFix
Owner:  ----
Closed:  Mar 2009


 
Reported by zzz...@gmail.com, Jan 14, 2009
using MySQL with pyodbc, result values from table selects appear to come
back as Python unicode objects (though oddly, not from literal string
selects).   But then it also will fail to compare a unicode value to a
non-unicode value on the server, leading to this contradictory behavior:

    import pyodbc
    c= pyodbc.connect("dsn=mysql_test;TrustedConnection=Yes")

    cursor = c.cursor()

    # passes
    cursor.execute("drop table foo")
    cursor.execute("create table foo(data varchar(30))")
    cursor.execute("insert into foo values (?)", u'test')
    cursor.execute("select * from foo")
    x = cursor.fetchone()[0]
    cursor.execute("select count(1) from foo where data=?", x)
    assert cursor.fetchone()[0] == 1

    # fails
    cursor.execute("drop table foo")
    cursor.execute("create table foo(data varchar(30))")
    cursor.execute("insert into foo values (?)", 'test')
    cursor.execute("select * from foo")
    x = cursor.fetchone()[0]
    cursor.execute("select count(1) from foo where data=?", x)
    assert cursor.fetchone()[0] == 1


Where in the second test, the value of 'test' in the database does not
compare to that of u'test'.  not sure if theres an ODBC setting I should be
using here.  
Jan 15, 2009
#1 zzz...@gmail.com
on further investigation, the problem is not the comparison.  It's just unicode
objects as bind parameters that are busted:

    import pyodbc
    c= pyodbc.connect("dsn=mysql_test;TrustedConnection=Yes")

    cursor = c.cursor()

    cursor.execute("select ?", u'test')
    print cursor.fetchone()

    cursor.execute("select ?", 'test')
    print cursor.fetchone()

It's just a little rough since result sets return strings as unicode objects, so
round trips have to take an asymmetric approach to bind/result handling.  Michael
Trier tells me pyodbc with MSSQL doesn't have this issue.  I'm on OSX if that matters.
Jan 15, 2009
Project Member #2 mkleehammer
I'll have to install MySQL this weekend and try it.  It definitely does not behave
that way on MSSQL; the returned varchar values would be str, not Unicode.

In fact, that is the first thing I would look at.  pyodbc doesn't convert ANSI to
Unicode on its own, so the database driver must be returning them as Unicode.  I
believe the MySQL driver has some keywords to control this.

In general, pyodbc never converts parameters -- it passes what you give directly to
the driver and tells the driver whether the value is ANSI or Unicode.  

*However*, on OS/X, the Python Unicode data type is 4 bytes while wchar_t is only 2
bytes, so there is some conversion going on.  If I remember correctly, I call a
Python API to perform the conversion, perhaps it is incorrect.

To find out for sure, generate a small ODBC trace for the test above and post it. 
That will tell me everything I need to know.
Status: Investigating
Jan 16, 2009
#3 zzz...@gmail.com
sure thing....
odbc_trace.txt
15.9 KB   View   Download
Jan 16, 2009
Project Member #4 mkleehammer
Thanks for the trace.  Here's what I see:

The first execute definitely binds the parameter as a Unicode string, which we can
see by looking at the SQLBindParameter call and noting the data type: SQL_C_WCHAR. 
(For those that don't know, 'WCHAR' stands for 'wide character', meaning Unicode.) 
The second execute binds using SQL_C_CHAR, so it is ANSI.

These are as I would expect.

What is unexpected is the fetch for both returns ANSI strings, even though the first
execute should simply return the Unicode parameter.  One of your previous emails
seemed to say it always returns Unicode, but it looks like it should always return a
string instead.  Can you confirm?  (Perhaps print cursor.description and show the
results of the print cursor.fetchone())

Perhaps we shouldn't be surprised, however, since this is a strange situation.  At
the point that the query is compiled, there is no way for the MySQL driver to know
the datatype of the parameter you are going to pass in the future.  A better test
might be to create a table with a Unicode column and ANSI column and print a row from
that.

Unfortunately, both of these issues may be different than the one you opened the
issue for, and that is the parameter in a where clause.  That issue seems to be
database/driver dependent.  The trace you've provided confirms that pyodbc passes the
parameters as you provide them: unicode objects are passed as WCHAR and string
objects are passed as CHAR.  My suggestion would be to always pass the datatype that
matches the target column.

How is it that they do not match?  Are you using ANSI MySQL tables but Unicode in
your Python program?


Jan 16, 2009
#5 zzz...@gmail.com
the issue is, if I use a regular old MySQL table, put a regular old string in it, I
get back a Python unicode object in the result set.  Using that result as a bind
parameter then fails since unicodes seem to get blown away on the way back in.

Heres a test/trace:

    import pyodbc
    c= pyodbc.connect("dsn=mysql_test;TrustedConnection=Yes")

    cursor = c.cursor()

    cursor.execute("drop table foo")
    cursor.execute("create table foo(data varchar(30))")
    cursor.execute("insert into foo values (?)", 'test')
    cursor.execute("select * from foo")
    result = cursor.fetchone()[0]
    assert isinstance(result, unicode)

    cursor.execute("select ?", result)
    r2 = cursor.fetchone()[0]
    assert result == r2

odbc_trace.txt
20.2 KB   View   Download
Jan 23, 2009
Project Member #6 mkleehammer
Your trace confirms that this is out of my control.  The data *really* does come back
as Unicode data:

  [000000.031062]
  Python A0313720 EXIT  SQLDescribeCol with return code 0 (SQL_SUCCESS)
  ...
  SQLSMALLINT     * 0xbfffedea (SQL_WVARCHAR)

There really isn't anything I can do in pyodbc -- there must be some kind of server
or driver setting for this.

What version of MySQL are you running and what driver are you using?  You can
determine the driver using:

  cnxn.getinfo(pyodbc.SQL_DRIVER_NAME)
  cnxn.getinfo(pyodbc.SQL_DRIVER_VER)

I'm going to leave this issue open for a bit while we investigate, but I don't think
there are any changes I can make -- pyodbc takes the data as it is sent from the
database/driver.

Mar 18, 2009
Project Member #7 mkleehammer
I'm going to have to close this since there really isn't anything I can do.  If you
found any more information, feel free to post here or at the pyodbc news group since
it may help others.

Sorry I couldn't do more.

Status: WontFix

Powered by Google Project Hosting