Incremental dump (statement by statement) #143

rogerbinns · 2013-12-29T03:31:00Z

From rogerbinns on September 06, 2013 22:42:09

https://bugs.launchpad.net/calibre/+bug/1217988

Original issue: http://code.google.com/p/apsw/issues/detail?id=143

rogerbinns · 2013-12-29T03:31:01Z

From rogerbinns on September 06, 2013 22:49:36

APSW only does the restore one statement at a time too. More accurately it reads a line and calls sqlite3_complete to see if it is complete which will be true if there is a trailing semicolon.

The maximum size of something added to the statement cache is 16384 bytes and the default size is 100 entries. Add in the size used by the SQLite virtual machine and things could cumulatively add up.

rogerbinns · 2013-12-29T03:31:02Z

From rogerbinns on September 06, 2013 22:50:57

I also recommend you use augmented stack traces - ie also print out local variables for each frame. You can see how to do that half way down http://apidoc.apsw.googlecode.com/hg/exceptions.html#augmentedstacktraces

rogerbinns · 2013-12-29T03:31:02Z

From adsense@calibre-ebook.com on September 06, 2013 23:21:43

Running the the dump and restore on the database attached to the ticket causes memory usage to climb by about 700MB. I suspect that most of that is simply from storing the dumped sql in memory. iterdump() from pysqlite avoids that since it produces only one statement at a time.

rogerbinns · 2013-12-29T03:31:02Z

From rogerbinns on September 06, 2013 23:26:09

Starting with a process only running the APSW shell in Python 2.7 and the database at launchpad on a 64 bit Linux machine, I get the resident set size climbing to 12MB when doing a dump and 23MB when doing a restore.

Virtual size is about double that. This seems to indicate that the APSW shell is the victim of something else exhausting memory, not the cause.

(Incidentally I do run the entire test suite under valgrind multiple times.)

rogerbinns · 2013-12-29T03:31:03Z

From rogerbinns on September 06, 2013 23:30:12

The dump is 75MB of UTF-8 encoded text.

Surely pysqlite will also require the same amount of memory to store the whole thing?

rogerbinns · 2013-12-29T03:31:03Z

From adsense@calibre-ebook.com on September 06, 2013 23:30:29

The problem isn't that there is a memory leak. The problem is that the dump and restore operate on the full dumped sql, which is stored in memory and can be very large. Is there some API I can use that allows the intermediate sql to not be stored in memory.

rogerbinns · 2013-12-29T03:31:04Z

From rogerbinns on September 06, 2013 23:36:55

There is no intermediate SQL. The dump is only in memory if you set stdin/out for the shell to be a StringIO. If you set it to a file then the dump ends up in the file.

During dumping, as each statement is generated it is written (self.write on self.stdout is called). During read, a line at a time is read using Python iteration on files.

The shell itself never does some sort of internal buffering or keeping around intermediate bits of SQL.

I usually use the tempfile module when I need stuff on disk rather than in memory.

rogerbinns · 2013-12-29T03:31:04Z

From adsense@calibre-ebook.com on September 06, 2013 23:51:15

Comment #5 on issue 143 by rogerbinns: MemoryError while restoring
http://code.google.com/p/apsw/issues/detail?id=143 >
The dump is 75MB of UTF-8 encoded text.

Yeah python2 does not store strings in UTF-8, it stores them in either
UCS-2 or UCS-4.

Surely pysqlite will also require the same amount of memory to store the
whole thing?

the code is pysqlite looks (basically) like this

for line in conn.iterdump():
newconn.execute(line)

in apsw it looks like this:

shell.process_command('.dump')
sql = buf.getvalue()
newconn.execute(sql)

What I'm asking for, is some API that lets me replace newconn.execute()
with something that can read a statement at a time from the dump. From
your other reply, I'm guessing that is
shell.process_command('.restore')?

rogerbinns · 2013-12-29T03:31:05Z

From adsense@calibre-ebook.com on September 06, 2013 23:52:57

Or from reading the apsw shell source code it looks like it needs to be

shell.process_command('.dump') into a disk file

and then

newshell.process_command('.restore') from the disk file.

rogerbinns · 2013-12-29T03:31:05Z

From adsense@calibre-ebook.com on September 06, 2013 23:53:33

Sorry make that

newshell.process_command('.read')

rogerbinns · 2013-12-29T03:31:06Z

From rogerbinns on September 06, 2013 23:59:05

You never said earlier that you were executing the dump at the same time as generating it! That will just make a backup. There is a perfectly good API exposed for it: http://apidoc.apsw.googlecode.com/hg/backup.html It isn't really possible to change things to provide the dump as a generator. A limitation of Python generators is they are only one stack frame deep.

rogerbinns · 2013-12-29T03:31:06Z

From adsense@calibre-ebook.com on September 07, 2013 00:01:36

It isn't at the same time, it is dump first on database 1, then after the dump is completed, execute the dump on database 2. Finally replace database 1 with database2 if the previous operations were successful.

And this is not equivalent to a backup, since it can recover from various types of minor corruption in the database.

rogerbinns · 2013-12-29T03:31:06Z

From rogerbinns on September 07, 2013 00:15:11

The only corruption you can recover from with APSW would be in the indices since they are read by the backup API but not by dumping.

The general advice is to fix the root cause of the corruption and make backups. https://www.sqlite.org/howtocorrupt.html You may also be able to do things with timing when certain commands are issued. For example use WAL mode with auto-checkpointing off, and do an integrity check at a useful point (eg start of session). If all is clear do the checkpoint, and if not delete the WAL since it most likely has the corruption.

Summary: Incremental dump (statement by statement) (was: MemoryError while restoring)
Status: WontFix
Labels: -Type-Defect -Priority-Critical Type-Enhancement Priority-Medium

ghost assigned rogerbinns Dec 29, 2013

rogerbinns closed this as completed Dec 29, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental dump (statement by statement) #143

Incremental dump (statement by statement) #143

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013

Incremental dump (statement by statement) #143

Incremental dump (statement by statement) #143

Comments

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013

rogerbinns commented Dec 29, 2013