|
ArchitecturalOverview
Overview of Hypertable Architecture
IntroductionThis document gives an architectural overview of Hypertable. Hypertable is designed to run on top of a "third party" distributed filesystem, such as Hadoop DFS. However, the system can also be run on top of a normal local filesystem. We recommend that you start with the local filesystem to get up and running and to play around with the system. The best way to get started hands-on using hypertable is to first download the source code, build it, and get the regression tests to pass. Once the regression tests are all passing, you can then start the servers runing on top of the local filesystem with the following command (assuming the installation directory is ~/hypertable/0.9.0.8): $ ~/hypertable/0.9.0.8/bin/start-all-servers.sh local Successfully started DFSBroker (local) Successfully started Hyperspace Successfully started Hypertable.Master Successfully started Hypertable.RangeServer $ You can create tables, load data, and issue queries with the Hypertable command interpreter "hypertable": $ ~/hypertable/0.9.0.8/bin/hypertable Welcome to the hypertable command interpreter. For information about Hypertable, visit http://www.hypertable.org/ Type 'help' for a list of commands, or 'help shell' for a list of shell meta commands. hypertable> To get a list of all the commands available, type 'help': hypertable> help CREATE TABLE ....... Creates a table DELETE ............. Deletes all or part of a row from a table DESCRIBE TABLE ..... Displays a table's schema DROP TABLE ......... Removes a table INSERT ............. Inserts data into a table LOAD DATA INFILE ... Loads data from a tab delimited input file into a table SELECT ............. Selects (and display) cells from a table SHOW CREATE TABLE .. Displays CREATE TABLE command used to create table SHOW TABLES ........ Displays the list of tables SHUTDOWN ........... Shuts servers down gracefully Statements must be terminated with ';' to execute. For more information on a specific statement, type 'help <statement>', where <statement> is one from the preceeding list. hypertable> Data ModelThe Hypertable data model consists of a multi-dimensional table of information that can be queried using a single primary key. The first dimension of the table is the row key. The row key is the primary key and defines the order in which the table data is physically stored. The second dimension is the column family. This dimension is somewhat analogous to a traditional database column. The third dimension is the column qualifier. Within each column family, there can be a theoretically infinite number of qualified instances. For example if we were building a URL tagging service, we might define column families content, url, and tag. Within the "tag" column family there could be an infinite number of qualified instances, such as tag:science, tag:theater, tag:good, etc. The fourth and final dimension is the time dimension. This dimension consists of a timestamp that is usually auto assigned by the system and represents the insertion time of the cell in nanoseconds since the epoch. Conceptually, a table in Hypertable can be thought of as a three dimensional Excel spreadsheet with timestamped versions of each cell. The following diagram graphically depicts a crawl database table called crawldb. The row key is the URL of a page to be crawled and the column families include: title, content, and anchor. The "anchor" column family illustrates the use of column qualifiers.
Under the hood, this multi-dimensional table of information is represented as a flat sorted list of key/value pairs. The key is essentially the concatenation of the four dimension keys (row, column family, column qualifier, and timestamp). The following diagram depicts the flattened key. One thing to note is that the timestamp is stored ones compliment big-endian so that the most recent cell sorts ahead of older versions of a cell.
So the above crawldb table would have a flattened representation that looks something like the following:
Physical Data LayoutAll table data is stored in the underlying distributed filesystem. The one that we use primarily is the Hadoop DFS, but it can be run on top of literally any filesystem. The system abstracts the interface to the Distributed Filesytem, so writing a connector to any filesystem is trivial. The key/value pair data is stored in files called CellStores. At the most abstract level, the CellStore contains a sorted list of key/value pairs. Physically, the key/value pairs are stored as a sequence of compressed blocks (approximately 65K each). At the end of the block sequence is an index which is a list of "last key" and block offsets. For each block in the file, there will be an index entry that contains the last key in the block along with the offset of the block. This index gets loaded into memory when the CellStore file is loaded by the system. The following diagram illustrates the format of the CellStore.
Access Groups - Traditional databases are considered to be either row oriented or column oriented depending on how data is physically stored. With a row oriented database, all the data for a given row is stored contiguously on disk. With a column oriented database, all data for a given column is stored contiguously on disk. Access groups in Hypertable provide a way to group columns together physically. All of the columns in an access group will have their data stored physically together in the same CellStore. This is essentially a hybrid approach. A row oriented datastore can be simulated by putting all of the columns into a single access group. A column oriented datastore can be simulated by putting each column into its own access group. System ComponentsThe following diagram illustrates all of the processes in the system and how they relate to one another.
Hyperspace - This is our system's equivalent of Chubby. Hyperspace (or Chubby) is a service that provides a filesystem for storing small amounts of metadata. It also acts as a lock manager in that either exclusive or shared lock and be acquired on any of the files or directories. Currently it is implemented as just a single server, but will be made distributed and highly available at some point in the near future. Google refers to Chubby as, "the root of all distributed data structures" which is a good way to think of this system. (Hyperspace C++ API) Range Server - Tables are broken into a set of contiguous row ranges, each of which is managed by a range server. Initially each table consists of a single range that spans the entire row key space. As the table fills with data, the range will eventually exceed a size threshold (default is 200MB) and will split into two ranges using the middle row key as a split point. One of the ranges will stay on the same range server that held the original range and the other will get reassigned to another range server by the Master. This splitting process continues for all of the ranges as they continue to grow. (Range Server C++ API) Each range server handles all of the reading and writing of table data for the ranges that it is responsible for. Range servers cache updates in memory (after writing them to a Commit Log) in what's called a CellCache. Periodically the CellCache will get flushed to disk (e.g. the DFS) in a specially formatted file called a CellStore. To scan over the data in an access group, the range server must do a heap merge of the CellCache and all of the CellStores for the access group. The following diagram illustrates this process.
Master - The master handles all meta operations such as creating and deleting tables. Client data does not move through the Master, so the Master can be down for short periods of time without clients being aware. The master is also responsible for detecting range server failures and re-assigning ranges if necessary. The master is also responsible for range server load balancing. Currently there is only a single Master process, but the system is designed in such a way as to allow hot standby masters. (Master C++ API) DFS Broker - Hypertable is designed to run on top of any filesystem. To achieve this, the system has abstracted the interface to the filesystem through something called the DFS broker. The DFS broker is a process that translates standardized filesystem protocol messages into the system calls that are unique to the specific filesystem. DFS brokers have been developed for HDFS (hadoop), KFS, and local (for running on top of a local fileystem). (DFS Broker C++ API) C++ Client APIApplication programs can access Hypertable through a client API that is provided as part of libHypertable. The client API consists of the following four main classes.
The following two source files provide a good example of how you use the Hypertable Client API. Together they illustrate how to load data into a table and query it.
Source Code
|
Sign in to add a comment