|
MetadataConsistencyCheck
Design of a maintenance tool that checks and fixes the METADATA table.
IntroductionUnder some unexpected situations, e.g. after recovery with incomplete meta logs (probably due to buggy underlying DFS), the METADATA table might become inconsistent with range server states. It'd be nice to have a maintenance tool (we'll call it htck) that checks the consistency of METADATA and tries to fix it. UsageI decide to implement htck as a stand-alone tool instead of integrated into a master maintenance thread because it is simpler and more controllable. Administrators may run htck multiple times to make sure everything goes well or stop it half-way if the process takes too long. The command line usage of htck is: htck [-d] [-f] By default, htck just checks the consistency of METADATA table, but doesn't try to fix it immediately. To do the real fix, an explicit command line switch '-f' must be used. The check phase has two modes: fast scan (default) and deep scan (-d). The difference is, for every range deep scan inquires the corresponding range server to see if it is really there, which is much more time-consuming. See the following sections for more details. Design DetailsTypes of Common Inconsistencies
PreparationTo find the actual reason of an inconsistency, the current information stored in the METADATA table is inadequate, e.g.:
Thus I propose two new columns be added into the schema of METADATA table: state and generation. Some possible states include: NEW, ASSIGNED and LOST (to be recovered), it'd be nice to define more states like READONLY and UNLOADED (HBase supports unloading a table completely from memory). The ASSIGNED state doesn't need to be explicitly specified as it is the default one. The generation is a serial number that is strictly increasing. A range server must use a new generation number every time it restarts. For now I think It's OK to borrow the Hyperspace session id. Fast ScanWhen doing a fast scan, htck should:
In more details: to deal with LOST portions of the METADATA table. Step 2 should take place in stages, i.e. first check /hyperspace/root, then the ROOT table, then second-level METADATA. Deep ScanDeep scan is the same as fast scan except that, for each range in the METADATA table, htck also inquires the corresponding range server to see if the range is there. This can be done by adding a new RPC method that returns true or false: bool RangeServerClient::load_range(const sockaddr_in &addr, const TableIdentifier &table, const RangeSpec &range); Fix PhaseFor inconsistency 1, as for now, print a message and let the admin restart the server for fast recovery. When auto recovery is implemented, treat it the same as inconsistency 2. For inconsistency 2, mark the range LOST, then notify the master to choose a range server and load the range with proper commit logs replayed. For inconsistency 3, mark the range NEW, then notify the master to choose a range server and load the range with the split log replayed. For inconsistency 4 and 5, these are really rare cases, do not know how to fix yet... The reason of letting the master give load_range orders is that it should have more detailed information about live range servers and can do better load balance in the future, besides it seems safer to let the master monopolize the power of loading / unloading ranges. To find the commit / split logs that needs to be replayed, the range server chosen to load the LOST / NEW range must know:
Potential ProblemsA potential problem is that htck / master / range servers may read / update the METADATA table concurrently. The solution is to let the Hypertable cluster go into read-only state while htck is running, which may e a future task. For now, if the admin is sure there's no updates to Hypertable during a time period, which is roughly the same as having a read-only state, it is also safe to run htck. This makes htck useful especially in cases that data is always loaded in batch during certain time periods (e.g. a crawler db that gets updated once every week). Another problem is range servers may still crash while htck is running even if the cluster is read-only. This is actually a rare case. If it really happens, htck may detect it by subscribing changes to the "/hypertable/servers" hyperspace directory, and stop running with an error message printed. The admin may then deal with the situation and run htck again later. Summary of ChangesMETADATA table schema:
Range Server:
Master:
Future WorkAs range state is added to the METADATA, we may add more features to Hypertable in the future, for example make a table READONLY or unload it from memory. These features are useful if we need to do snapshots of tables in Hypertable. Client table scanner / mutator can also make use of the range state information while handling errors. For instance, if a scanner fails to look up a key in a specific range, it may then look at the state column of that range to see whether the range is NEW (most likely to be the result of a recent split), LOST or UNLOADED, and apply different policies, e.g. wait and retry on the NEW and LOST state, and fail directly if the range is UNLOADED. Luke suggested an approach to do htck offline: I thought about the problem a little. It seems to me that the cleanest approach would be doing the htck while hypertable (except dfs brokers) is offline. You can basically do a merge scan of metadata cellstores and metadata commit logs to construct the complete metadata and compare with all the metalogs. You can even reclaim lost ranges (by looking at gaps in the metadata) that are neither in metadata or metalogs but still in cellstores this way, just like fsck does for lost blocks. You can fix problems by (re)writing to commit logs/metalogs, so when hypertable restarts, the recovery process will take care of all the issues. This requires some refactoring of the range server code. Much of the range server functionality is already in libHyperRanger.a/so anyway, which is used by csdump. I think this is a better approach (not to mention more challenging and exciting :) that does not suffer all kinds of chicken and egg problems from the live scans. Clearly this approach is safer and cleaner. It can even fix catastrophic problems that prevent range servers from restarting. The only drawback is all range servers must be shutdown, which introduces relatively long down time. |
Sign in to add a comment
hi, Donald,
Hypertable instance is corrupt in my workspace, so I want to try to use htck to check and repair hypertable.
How to get the source code of htck ?
thansk