|
RepartitioningProcess
The process of online repartitioning a schema
How the repartitioning process worksThis page describes how the sample database is set up, and how it repartitions as needed. If you haven't read DatabaseDesignCriteria already, it would be a good idea, as this page will make more sense having read it. Initial setupThe database is accessed via a proxy. The proxy forwards requests to the correct partition, or to all partitions if it's a query that retrieves a series of records.
The proxy decides which partition(s) to forward requests to based on an internal table, which splits the partition keyspace into ranges, and has one database per range. For reasons that will become clear below, each row has a "status" column, and ranges are specified separately for reads and writes. (The UUID ranges have been shortened for readability.)
Re-partitioningThe call comes through and we decide to repartition range 1. In this case we're going to divide the range equally. The procedure is as follows: 1) We create a FIFO queue for all SQL writes that fall within the range of the new partition. From this point onwards, the database copies all successful writes for that range into the queue. 2) We create the new partition by backing-up and restoring the current database containing the range for this new partition, and deleting data outside the range. 3) We then apply the contents of the FIFO queue to the new database.
4) A new range is added to the proxy lookup table, and the original range row is updated for writes only.
The new range has status disabled. The status is used by write calls at the proxy level, which enter a spinlock when no active partition is available to write to. Because the read range start and end are both null, no reads will be executed against this partition.
5) Any writes remaining in the FIFO queue that have accumulated between the end of step 3 and the commit at the end of step 4 are applied to the new database. 6) The proxy lookup table is updated again to its final state:
7) At this point the repartition is complete. The FIFO queue should be confirmed empty and disposed of. Notes on the processThe total time from step 4 to step 6 has to be kept as small as possible to avoid locking a large number of writes. The reason for having a read and a write queue is because if they're both updated at the same time, we have the choice of either blocking reads as well as writes, or allowing reads from a new partition that may not have the tail-end of the queue applied to it post-lock. The process can be simply explained as:
This process is quite flexible, and could also be used to up/down-grade an entire database in-place, or upgrade to a new schema without disrupting access. For example, with modification to step 2, 4 and 6, this same process could also be used for moving an entire partition from one server to another, for example to updgrade or downgrade hardware. Likewise, with modification of step 2, it should also be possible to consolidate partitions onto one host or one schema - in effect, departitioning. If the new partition is modified after being restored, but before being brought online, then the database has effectively been migrated at runtime. Backwards-compatible migrations could be performed on a per-partition basis. If the schema modification in the migration is not backwards-compatible, then the writes being saved to the queue would need to reference to the new database schema. With the process explained, the page ImplementationDetails explains how it works in practise, which is considerably more hairy than in theory. |
Sign in to add a comment