gimd


Git Message Database

gimd (pronounced gim-dee) provides a small distributed database layered on top of the powerful Git version control system.

Features

  • Multiple version concurrency
  • Point-in-time recovery
  • Concurrent transactions
  • Automatic replication
  • Some automatic merge resolution
  • Distributed peers
  • Disconnected operation

ACID Compliance

gimd roughly provides the basic ACID properties:

  • Atomicity : All changes within a transaction occur within a single Git commit. As Git is able to perform an atomic update to a branch, either all changes occur at once, or no changes occur.

  • Consistency : Constraints are currently an application level property, that is the application must validate any data constraints during the transaction. By performing an atomic compare-and-swap operation during commit, gimd can abort a transaction if changes were made that might violate application data constraints.

  • Isolation : As commit names are not "published" for other threads to see until after the transaction completes, and the underlying Git object data is immutable, it is impossible for one thread to see another thread's partial work.

  • Durability : As the Git object data is immutable, once an object is written, it cannot be removed. Performing an fsync() on object writes is a performance vs. durability tradeoff that can be configured per repository.

Consistency in a distributed system is difficult. When two distributed systems make edits, they branch the database. Each branch in isolation is still consistent with itself. Merging the two branches together is a new transaction, and may require application assistance to validate that the merge result is still consistent. If the merge result would yield an unacceptable inconsistency to the application, human assistance is required to finish the merge. This fits relatively nicely with the Git merge tracking system of commits containing two or more parents, the database branches which were merged to produce the new state.

Implementation

gimd is implemented in Scala for the JVM, but with an eye towards compatibility with both the Scala and Java programming languages.

At the lower level, gimd will use JGit to read and write to a local Git repository. This avoids the expense of forking external processes for basic data manipulation, or network transport of changes. Apache Lucene is also being considered for inverted indexes.

Storage Model

The gimd data storage model is loosely based on the text format of Google protobuf. It is expected that other tools, including simple sed scripts, could be used to make modifications to a gimd store by checking out the files from git, editing them, and checking them back in.

Project Information

Labels:
git jgit scala