Export to GitHub

pyodbc - issue #330

Loosening up encodings


Posted on Jul 10, 2013 by Happy Bird

In Python2, the user was responsible for encoding, making sure that what he sent through the odbc line was correct.

On Python3, 99% of the cases are much more clear, however it's not possible to query a database when a field has mixed encodings over records or contains other 'garbage' such as 'binary' encrypted data not matching any encoding, even zipped data or other trash that shouldn´t be there. Reality is, sometimes there is.

In windows ODBC this can be accomplish by unchecking 'perform character translation' in the ODBC settings and using Python27 so such things were possible albeit ugly enough.

Knowing this is all bad practise, why not setting unicode_results=True for a default and, when set to False, just give a bytes() object as before (PY_MAJOR_VERSION<3)? Then the Python programmer can decode() all he wants but only when he explicitely chose to get himself into such mess.

Checking the source, only getdata.cpp and params.cpp would need a minor (as in: two lines change) adjustment to enable this.

The modification would not lead to new untested behavior, just behavior more compatible with Python27, so I think it's relatively safe.

That way the Python programmer can 'have it his way' and be more compatible with any python2.7 project in existance in case the standard unicode way would really not do for his situation.

I'd be happy to commit such modification on Git if it was clear this was going to be accepted in the main branch, otherwise I'll just keep it in my private version.

For the rest: my compliments for pyodbc being an awesome extension.

Status: New

Labels:
Type-Defect Priority-Medium