My favorites | Sign in
Project Home Downloads Wiki
READ-ONLY: This project has been archived. For more information see this post.
Search
for
  Advanced search   Search tips   Subscriptions
Issue 250: slow handling of executemany
22 people starred this issue and may be notified of changes. Back to list
Status:  New
Owner:  ----


 
Reported by sok...@gmail.com, Mar 29, 2012
Please refer to this stackoverflow link (I am the asker):

http://stackoverflow.com/a/9932361

Apparently ceODBC doesn't have this issue, though I haven't tested it yet.
Apr 5, 2012
#2 jaycgree...@gmail.com
Tried both ceODBC and mxODBC and both were also painfully slow.  Ended up going with an adodb connection with help from http://www.ecp.cc/pyado.html

        comConn = win32com.client.Dispatch(r'ADODB.Connection')
	DSN = 'PROVIDER=Microsoft.Jet.OLEDB.4.0;DATA SOURCE=%s%s' %(dbDIR,dbOut)
	comConn.Open(DSN)

	rs = win32com.client.Dispatch(r'ADODB.Recordset')
	rs.Open('[' + tblName +']', comConn, 1, 3)

	for f in values:
		rs.AddNew(LVI_LST, f)


Dec 30, 2012
#3 sok...@gmail.com
The behavior I encountered last time was as described by ghoerz in the aforementioned stackoverflow link, i.e.

    for each set of parameters
        prepare and execute the insert statement
        unprepare the statement
        commit

However, now that I try again, I couldn't reproduce that behavior anymore with any version of pyodbc from 2.1.8 to 3.0.6. Instead, it is now:

    for each set of parameters
        prepare and execute the insert statement
    unprepare the statement
    commit

This gives good enough performance.

Does anyone have the environment that can reproduce the slow behavior described by ghoerz? If not, I think we can close the issue. I am sorry that I wasn't diligent enough to note it down when reporting the issue.
Jan 2, 2013
#4 bryan.ea...@gmail.com
Verified resolved with pyodbc-2.1.9.win32-py2.7
profiler.png
51.6 KB   View   Download
Aug 5, 2013
#5 erski...@dnc.org
I'm still finding a very large difference in the performance of the 'executemany' command between the pyodbc module and the ceODBC module.  In a nutshell, ceODBC is 35 times faster doing bulk inserts over pyodbc.

In my case, I'm inserting 250,000 records into a SQL Server 2008 R2 database (on its own server), using Python 2.7 on a Windows 7 (64-bit) PC, using the '{SQL Server Native Client 10.0}' driver.  I'm batching the records up into sets of 10,000 records before running 'executemany'.  This takes 12.5 minutes with pyodbc, but just 21 seconds with ceODBC.  A huge difference.

For reference, with this test, pyodbc is about 30% slower than the pymssql module.

Can you think of a reason why the pyodbc module should be so slow?  Is there something I should be doing in my Python code?
Oct 29, 2013
#6 rlindne...@gmail.com
I don't think this issue is resolved. 

I'm using the most recent pyodbc-3.0.7-py2.7 and have found the same phenomenon. The same python code for pyodbc takes minutes, but for ceODBC takes only seconds. Version is ceODBC-2.0.1-py2.7. Critical code is

    conn = ceODBC.connect(conninfo, autocommit = True)
    cur = conn.cursor()
    cur.executemany(UPSERT_SQL, DATA)
    conn.close()


The executemany implementation of ceODBC must be many factors more efficient. If someone tells me how to do the profiling, I'm happy to help. The database I use is SAP HANA and for convenience, I use a 32-bit ODBC driver.

--Richard
Sep 17, 2014
#8 ahz...@gmail.com
I'm also having trouble with pyODBC performance.

I tested  1000 small rows into Netezza.

ceODBC executemany: 0.3 seconds
pyodbc executemany: 180-195 seconds
pyodbc execute: 78-130 seconds

Python 2.7 32-bit, pyodbc 3.0.7, ceodbc 2.0.1, Windows 7 64-bit, Netezza Release 7.0 (P-1) [Build 26407]
Nov 30, 2014
#9 Atoc....@gmail.com
If need to update 300.000 columns it never finishes. This bug really needs to be fixed.
Jan 14, 2015
#10 javier.r...@gmail.com
AFAICS, Cursor_executemany is binding a single parameter and calling execute (which calls SQLExecute).

In ceODBC, it binds all parameters and then calls SQLExecute.

Am I reading this correctly?
Feb 10, 2015
#11 rich...@gmail.com
I tested by running an insert statement for multiple rows of data using executemany.  In the SQL Server Profiler of the run, it showed every row being inserted using the sp_prepexec stored procedure.  This is causing implicit unprepares and explicit prepares of the query over and over again.  What I believe it should do is call sp_prepexec only for the first row (which prepares the query and executes the first row), and then call sp_execute for all subsequent rows in the batch (which will use the existing prepared query).

At the end of the batch, executemany does call an explicit sp_unprepare, which is fine, but I believe these implicit unprepares and explicit prepares are what is killing performance.  (You can see these in Profiler if you turn on the "Prepare SQL" and "Unprepare SQL" events in the "TSQL" events heading.)


pyodbc__executemany__sp_prepexec.png
19.6 KB   View   Download
Mar 27, 2015
#12 mewa...@gmail.com
It looks to me like one major problem is the function PrepareAndBind() (and how it gets used), which should be two separate functions in order to be able to prepare once and bind many times. This might also address the comment that says "REVIEW: Why don't we always prepare?" which is in a code section that would seem to have a fair amount of room for consolidation and simplification. so to fix:
- separate PrepareAndBind into 2 func
- consolidate the code that calls prepare() and bind() functions, and don't have separate cases for withparams or without params until you get to the bind call
- only call prepare once
- maybe consolidate the code for executemany() vs execute once etc

or maybe just ditch pyodbc and get ceodbc instead...
Mar 27, 2015
#13 mewa...@gmail.com
Looks like there is also a malloc and free call for every Bind, where it may be better (and faster and cleaner) to have auto mem in the func that calling the initial prep, re-using that mem for each bind (no explicit free required at end).
Mar 28, 2015
#14 toastie...@gmail.com
The problem is that pyodbc does not use "parameterized arrays" (whereas ceodbc does).  For a description of parameterized arrays, see the section "Using Arrays of Parameters" on this webpage:
https://www.progress.com/products/datadirect-connect/odbc-drivers/odbc-developer-center/odbc-tutorials/designing-performance-optimized-odbc-applications/select-functions-that-optimize-performance

So, for example, if you call pyodbc's executemany() function with an INSERT statement and a list of 10,000 records (each record consisting of another list of data fields), pyodbc will make 10,000 calls to the database where each call includes just one record.  This is very slow.  On the other hand, ceodbc will make just one call to the database with all 10,000 records in the payload.  This is much, much quicker.

Having said that, not all databases accept parameterized arrays.  Microsoft SQL Server does:
https://msdn.microsoft.com/en-us/library/ms713841(v=vs.85).aspx
I don't know about other databases.

I suspect that the demand for parameterized arrays will increase now that sqlalchemy has added bulk INSERT operations to version 1.0:
http://docs.sqlalchemy.org/en/latest/orm/persistence_techniques.html#bulk-operations
I'm sure there will soon be a lot of people out there using sqlalchemy wondering why the bulk operations are no quicker than non-bulk operations (especially with SQL Server).

Hopefully, Michael Kleehammer (or somebody else) will have some time to make this improvement.  It's really needed.

Jun 13, 2015
#15 shaun.currier@gmail.com
I am also suffering from this issue trying to submit to a Teradata DB.

Powered by Google Project Hosting