| Issue 296: | Missing rows in select when fetching a long string (XML field) | |
| 4 people starred this issue and may be notified of changes. | Back to list |
What steps will reproduce the problem?
What is the expected output? What do you see instead?
There's a select from an MS SQL Server database that is returning no rows when I include an XML field.
For example, if I do:
curs.execute("select Starttidspunkt from Koersel where SerieNr='12485001'")
then I get 1 row with a timestamp value.
But if I do:
curs.execute("select CAST(Pdata AS VARBINARY(max)) from Koersel where SerieNr='12485001'")
then I get 0 rows, despite using the exact same where-criteria. The Pdata field is an XML field and the cast turns it into some 30KB of text.
What version of the product are you using? On what operating system?
3.0.6 on XP,Win7.
Please provide any additional information below.
Studying the 3.0.6 sources, I think I've found the cause: The "return 0" at the end of the GetDataString function in getdata.cpp.
The issue is with the handling of indeterminate length text. It looks like GetDataString starts out with a 1024 bytes buffer, and then loops up to 10 times, adding 2048 bytes each time. The last allocation isn't used, so that makes the effective maximum size 19KiB. The for loop exhausts, and the "return 0" at the end convinces Cursor_fetchlist to break, but since there is no exception raised, this results in rows missing instead of an error message.
Attached a patch to use exponential instead of linear growth of the buffer, allowing for strings of up to 2MiB, and an explanatory MemoryError for when even that's not enough.
It's untested, alas, since I can't build pyodbc (seems to be MSVC++-only on MSWin). I'd be very grateful for a build with this patch included ASAP.
Apr 24, 2013
I've attached an updated/tested patch which adds a few TRACE statements and renames a local variable. I tested this patch on Windows XP/SP3 with Python 2.74 and Python 3.3.1. Here is an updated summary of this patch: This patch increases the GetDataString default buffer size/increment to 4kB and changes its reallocation algorithm. Optionally, the GetDataString default buffer size/increment can be changed via the new pyodbc connect keyword getdata_str_size. The supported range for the getdata_str_size keyword is 1kB - 128kB and the patch enforces this range. If needed, the GetDataString buffer will be reallocated by doubling (up to 10 times) the previously used buffer size increment. In summary, this patch allows the pyodbc user to control the maximum GetDataString buffer size needed for their applications within the range of 2047kB - 262016kB. I've also attached updated Windows binaries for Python 2.7 and Python 3.3, if anyone wants to further test this patch.
Jul 17, 2013
I've updated my patch for the pyodbc 3.0.7 release. I've also included a few binaries for anyone who might need them. The binaries were built using Python 2.7.5 and Python 3.3.2.
Jan 24, 2015
I was having the same issue. Specifically this was a problem using Native Client 11 and not Native Client 10. I've gone back to using Native Client 10 as a work around but this patch also fixes the problem. |
2.9 KB View Download
226 KB Download
231 KB Download