|
ComponentsOverview
Each topic has it's own Featured wiki page. Cloud InteroperabilityAs a convenient way to discuss features, we categorize the following projects as either targeting cloud storage services or cloud computing services. StorageThriftStore For cloud storage interoperability, we created ThriftStore to provide a common interface to multiple cloud storage frameworks. ThriftStore currently supports Sector and Hadoop, but support for other file systems such as KFS or Amazon's S3 could easily be added. Although there is some overhead associated with going through a Thrift service, ThriftStore provides the facility to implement a single client that can be used to access any supported cloud storage system. Additionally, the client can be written in any language supported by Thrift. More details on ThriftStore are available at the ThriftStore project page. PySector To provided a less general alternative to ThriftStore, we created PySector. This is a Python extension to the Sector C++ client, allowing a developer to use 100% python to run Sector file system commands. It can only be used with Sector, but has very little overhead. PySector has been contributed to Sector and is available on that project's Sourceforge site as a download or from source control. More information can be found at the PySector project page. SectorJNI The Sector C++ client has also been opened up to Java applications by the Sector JNI project. Although there is also some overhead associated with the JNI bridge between Java and the native C++ code, it does allow for pure Java applications to use Sector. SectorJNI has been contributed to Sector and is available on that project's Sourceforge site as a download or from source control. More can be found at SectorJNI. ComputeSector File System for Hadoop The dominant open source compute cloud right now is Hadoop. We created an implementation of Hadoop's File System Interface which allows Sector to be used as the backing store for Hadoop MapReduce applications. This is a real-world use of the Sector JNI that we created for the Storage Interoperability Project. We used the JNI / NIO access that provides access to Sector and coded a client that takes Hadoop File System command and passes them on Sector. This gives developers the option of running Hadoop MapReduce jobs over data stored in Sector. This is currently in the process of being added to Hadoop as a contributed project. More details can be found at the SectorFileSystem page. PySphere Sphere is middleware that is designed to process data managed by Sector. Sphere implements an application framework to perform parallel processing on data stored in the Sector file system. Sphere supports two models for implementing applications: user defined functions (UDF) or MapReduce. The UDF model allows any user defined functions to be applied to a Sector dataset, while Sphere MapReduce follows the traditional MapReduce model. Sphere applications, regardless of the model, have to be written in C++. PySphere is a proof-of-concept to show an example of how Python MapReduce functions can be executed within the Sphere framework against data stored in Sector. Although PySphere can support many MapReduce applications, there are limitations with the embedded Python approach. A better way to support language interoperability with Sphere is a streaming type interface, similar to the Hadoop streaming interface. A streaming interface provides the facility to read and write Sector data to and from standard I/O, which allows processing applications to be implemented in any language which supports standard I/O. This is currently under development. PySphere has been contributed to Sector and is available on that project's Sourceforge site as a download or from source control. More can be found at PySphere. Note regarding SectorSector is an open source cloud written in C++ for storing, sharing and processing large data sets. Sector is broadly similar to the Google File System and the Hadoop Distributed File System, except that it is designed to utilize wide area high performance networks. Sector also implements security and is currently being used to bring up a HIPAA-compliant private cloud. The Sector/Sphere project is developed at the National Center for Data Mining (NCDM) and is an associated project of the Open Cloud Consortium (OCC). Sector is supported in part by the National Science Foundation. A technical report covering the design of Sector is available from arXiv. |