Currently load.py and dump.py utilities are loading/dumping documents one by one which is tremendously slow.
Introducing bulk loading/dumping will really speed up the things here.
Maybe we can add an option like "--bulk-size" with default value set to 1 (load/dump documents one by one, just like it happens now) to allow user some additional utility tuning.
Comment #1
Posted on Jun 14, 2013 by Quick CatI'm working on initial implementation here, will provide some patches later
Comment #2
Posted on Jun 17, 2013 by Quick CatI finished bulk dumping documents. You can see it here https://code.google.com/r/paveltsipinio-bulk-dumping/source/detail?r=e0f1bda24cc0bc487bf782ebdabc9d817bf7d4f6&name=bulk_dumping
Comment #3
Posted on Jun 17, 2013 by Quick RhinoGood stuff! For inclusion into CouchDB-Python, I have a number of requests:
- Please remove the change in .hgignore, as it isn't needed anymore
- Please see if you can add a test for the new behavior
- It would be great if you can split this into two patches: one that abstracts writing into a separate function, and another one that actually does the bulk requests/writes -- this makes it easier to review the changes now and in the future
Comment #4
Posted on Jun 17, 2013 by Quick CatI fixed your requests and added bulk load method. https://code.google.com/r/paveltsipinio-bulk-dumping/source/detail?r=de81adea330909f13d9bf37f98e25d4b7c657a92&name=bulk_dumping
Comment #5
Posted on Jun 18, 2013 by Quick RhinoI've pushed modified versions; for r6f91fa675423, I:
- Renamed function from write_dump() to dump_doc()
- Moved dump_doc() outside dump_db(), added envelope argument
- Rewrote commit message to clarify
In re8cafe210d91, I:
- Made sure lines didn't get longer than 80 chars
- Tightened up the loop code (while True, if condition: break is a little silly)
- Rewrote commit message to clarify
Could you redo your bulk loading along these lines? You also introduce a bug wrt error handling; db.update() doesn't throw Exceptions like db.setattr(). Also, your test case references a test data file that isn't included in the patch.
Comment #6
Posted on Jun 18, 2013 by Quick CatI hope I clearly understand your recommendations about code design. I pushed it to https://code.google.com/r/paveltsipinio-bulk-dumping/source/detail?r=46b5043fe465274850c4a821e468ca9ca90b70e0&name=bulk_dumping
I did not understand what you mean about test data file. I don't have any test data files.
Comment #7
Posted on Jul 15, 2014 by Quick RhinoThis issue has been migrated to GitHub. Please continue discussion here:
Status: New
Labels:
Type-Defect
Priority-Medium