New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incremental dump (statement by statement) #143
Comments
From rogerbinns on September 06, 2013 22:49:36 APSW only does the restore one statement at a time too. More accurately it reads a line and calls sqlite3_complete to see if it is complete which will be true if there is a trailing semicolon. The maximum size of something added to the statement cache is 16384 bytes and the default size is 100 entries. Add in the size used by the SQLite virtual machine and things could cumulatively add up. |
From rogerbinns on September 06, 2013 22:50:57 I also recommend you use augmented stack traces - ie also print out local variables for each frame. You can see how to do that half way down http://apidoc.apsw.googlecode.com/hg/exceptions.html#augmentedstacktraces |
From adsense@calibre-ebook.com on September 06, 2013 23:21:43 Running the the dump and restore on the database attached to the ticket causes memory usage to climb by about 700MB. I suspect that most of that is simply from storing the dumped sql in memory. iterdump() from pysqlite avoids that since it produces only one statement at a time. |
From rogerbinns on September 06, 2013 23:26:09 Starting with a process only running the APSW shell in Python 2.7 and the database at launchpad on a 64 bit Linux machine, I get the resident set size climbing to 12MB when doing a dump and 23MB when doing a restore. Virtual size is about double that. This seems to indicate that the APSW shell is the victim of something else exhausting memory, not the cause. (Incidentally I do run the entire test suite under valgrind multiple times.) |
From rogerbinns on September 06, 2013 23:30:12 The dump is 75MB of UTF-8 encoded text. Surely pysqlite will also require the same amount of memory to store the whole thing? |
From adsense@calibre-ebook.com on September 06, 2013 23:30:29 The problem isn't that there is a memory leak. The problem is that the dump and restore operate on the full dumped sql, which is stored in memory and can be very large. Is there some API I can use that allows the intermediate sql to not be stored in memory. |
From rogerbinns on September 06, 2013 23:36:55 There is no intermediate SQL. The dump is only in memory if you set stdin/out for the shell to be a StringIO. If you set it to a file then the dump ends up in the file. During dumping, as each statement is generated it is written (self.write on self.stdout is called). During read, a line at a time is read using Python iteration on files. The shell itself never does some sort of internal buffering or keeping around intermediate bits of SQL. I usually use the tempfile module when I need stuff on disk rather than in memory. |
From adsense@calibre-ebook.com on September 06, 2013 23:51:15
Yeah python2 does not store strings in UTF-8, it stores them in either
the code is pysqlite looks (basically) like this for line in conn.iterdump(): in apsw it looks like this: shell.process_command('.dump') What I'm asking for, is some API that lets me replace newconn.execute() |
From adsense@calibre-ebook.com on September 06, 2013 23:52:57 Or from reading the apsw shell source code it looks like it needs to be shell.process_command('.dump') into a disk file and then newshell.process_command('.restore') from the disk file. |
From adsense@calibre-ebook.com on September 06, 2013 23:53:33 Sorry make that newshell.process_command('.read') |
From rogerbinns on September 06, 2013 23:59:05 You never said earlier that you were executing the dump at the same time as generating it! That will just make a backup. There is a perfectly good API exposed for it: http://apidoc.apsw.googlecode.com/hg/backup.html It isn't really possible to change things to provide the dump as a generator. A limitation of Python generators is they are only one stack frame deep. |
From adsense@calibre-ebook.com on September 07, 2013 00:01:36 It isn't at the same time, it is dump first on database 1, then after the dump is completed, execute the dump on database 2. Finally replace database 1 with database2 if the previous operations were successful. And this is not equivalent to a backup, since it can recover from various types of minor corruption in the database. |
From rogerbinns on September 07, 2013 00:15:11 The only corruption you can recover from with APSW would be in the indices since they are read by the backup API but not by dumping. The general advice is to fix the root cause of the corruption and make backups. https://www.sqlite.org/howtocorrupt.html You may also be able to do things with timing when certain commands are issued. For example use WAL mode with auto-checkpointing off, and do an integrity check at a useful point (eg start of session). If all is clear do the checkpoint, and if not delete the WAL since it most likely has the corruption. Summary: Incremental dump (statement by statement) (was: MemoryError while restoring) |
From rogerbinns on September 06, 2013 22:42:09
https://bugs.launchpad.net/calibre/+bug/1217988
Original issue: http://code.google.com/p/apsw/issues/detail?id=143
The text was updated successfully, but these errors were encountered: