Export to GitHub

couchdb-python - issue #226

Provide ability to do bulk dump and load


Posted on Jun 14, 2013 by Grumpy Cat

Currently load.py and dump.py utilities are loading/dumping documents one by one which is tremendously slow.

Introducing bulk loading/dumping will really speed up the things here.

Maybe we can add an option like "--bulk-size" with default value set to 1 (load/dump documents one by one, just like it happens now) to allow user some additional utility tuning.

Comment #1

Posted on Jun 14, 2013 by Quick Cat

I'm working on initial implementation here, will provide some patches later

Comment #2

Posted on Jun 17, 2013 by Quick Cat

I finished bulk dumping documents. You can see it here https://code.google.com/r/paveltsipinio-bulk-dumping/source/detail?r=e0f1bda24cc0bc487bf782ebdabc9d817bf7d4f6&name=bulk_dumping

Comment #3

Posted on Jun 17, 2013 by Quick Rhino

Good stuff! For inclusion into CouchDB-Python, I have a number of requests:

  • Please remove the change in .hgignore, as it isn't needed anymore
  • Please see if you can add a test for the new behavior
  • It would be great if you can split this into two patches: one that abstracts writing into a separate function, and another one that actually does the bulk requests/writes -- this makes it easier to review the changes now and in the future

Comment #4

Posted on Jun 17, 2013 by Quick Cat

I fixed your requests and added bulk load method. https://code.google.com/r/paveltsipinio-bulk-dumping/source/detail?r=de81adea330909f13d9bf37f98e25d4b7c657a92&name=bulk_dumping

Comment #5

Posted on Jun 18, 2013 by Quick Rhino

I've pushed modified versions; for r6f91fa675423, I:

  • Renamed function from write_dump() to dump_doc()
  • Moved dump_doc() outside dump_db(), added envelope argument
  • Rewrote commit message to clarify

In re8cafe210d91, I:

  • Made sure lines didn't get longer than 80 chars
  • Tightened up the loop code (while True, if condition: break is a little silly)
  • Rewrote commit message to clarify

Could you redo your bulk loading along these lines? You also introduce a bug wrt error handling; db.update() doesn't throw Exceptions like db.setattr(). Also, your test case references a test data file that isn't included in the patch.

Comment #6

Posted on Jun 18, 2013 by Quick Cat

I hope I clearly understand your recommendations about code design. I pushed it to https://code.google.com/r/paveltsipinio-bulk-dumping/source/detail?r=46b5043fe465274850c4a821e468ca9ca90b70e0&name=bulk_dumping

I did not understand what you mean about test data file. I don't have any test data files.

Comment #7

Posted on Jul 15, 2014 by Quick Rhino

This issue has been migrated to GitHub. Please continue discussion here:

https://github.com/djc/couchdb-python/issues/226

Status: New

Labels:
Type-Defect Priority-Medium