My favorites | Sign in
Project Home
READ-ONLY: This project has been archived. For more information see this post.
Search
for
  Advanced search   Search tips   Subscriptions
Issue 27: Transaction usage in filesys importer
1 person starred this issue and may be notified of changes. Back to list
Status:  Fixed
Owner:  ejrh00@gmail.com
Closed:  Dec 2011


 
Project Member Reported by ejrh00@gmail.com, Dec 4, 2011
When run concurrently (for example, importing from both smaug and ancalagon) the importer may sometimes run into transaction serialisability errors.  In this case a TransactionRollbackError is thrown, and ideally the program is supposed to catch this and retry the transaction.

Current behaviour is to abort when receiving this error.  (In fact the error is wrapped in a generic exception, which will need to be addressed if we are to catch and handle that specific error.)

Handling the error properly is a bit tricky because the importer misuses transactions: it batches several unrelated updates into a transaction, and commits the current transaction after a configurable interval has passed.

So the transaction might have begun when it was importing /usr/local/lib, and might get a serialisation error in /var/log.  Somehow, all the work it has done needs to be redone.

There are two approaches we could take:

  * Use transactions properly: put each set of related changes into a transaction.  This is a bit slow because the per-transaction overhead is significant compared to the normal DB cost.  Many transactions will in fact have no updates, because those branches are already imported.  Creating and committing them individually makes them far more expensive than their current cost.

  * Keep track of where the current transaction was started, and rework from that point if we need to redo it.  So in the above example, we'd record the transaction as starting in /usr/local/lib; then when it fails in /var/log, we pop off the call stack until we reach their common prefix /.  Then in that call, we call down /usr/local/lib and start again.  This sounds pretty tricky though and continues to misuse transactions.

An ideal transaction is the import of a single item: either a file, or a directory (assuming all its children are imported).

A compromise between proper transaction use and efficient transaction batching is to make each transaction correspond to a subbranch.  Sizing these could be done dynamically somehow.

Something like:

import_dir(own_transaction=False):
    if transaction interval is exceeded:
        own_transaction flag = True
    if own_transaction:
        start a new transaction
    
    for each file:
        import_file

    # If any dir import made its own transaction, put its remaining siblings in one too.
    dir_own_transaction = False
    for each dir:
        status, dir_own_transaction = import_dir(dir_own_transaction)
    
    return status including own_transaction

Dec 28, 2011
Project Member #1 ejrh00@gmail.com
Fixed by r174.

When a directory is imported, a note is made of the current transaction number.

If a serialisation error occurs while processing that directory, the current transaction number is compared against the stored number.  If they are different, then only the results of this directory's children has been discarded, and the import is retried here.  If not different, the exception is bubbled up to the next level.  At the top level, the import of the whole root directory is retried regardless of transaction number.

Status: Fixed

Powered by Google Project Hosting