hop

Hadoop Online Prototype

The Hadoop Online Prototype (HOP) is a modified version of Hadoop MapReduce that allows data to be pipelined between tasks and between jobs. This can enable better cluster utilization and increased parallelism, and allows new functionality: online aggregation (approximate answers as a job runs), and stream processing (MapReduce jobs that run continuously, processing new data as it arrives).

For more information on the HOP design, see our NSDI'10 paper: MapReduce Online.

As the name suggests, the current version of HOP is a prototype: it works for us, but it isn't recommended for production use. That said, we welcome any feedback on HOP!

Getting Started

HOP is currently based on Hadoop 0.19.2, and supports all the features and configuration parameters supported by that version of Hadoop.

To get started using HOP, you can obtain an alpha release from the "Downloads" page, or the current SVN code from the "Source" page. The Hadoop Quick Start instructions apply equally to HOP, if you've never used Hadoop.
To enable pipelining and/or snapshots, you should tweak the JobConf in your MapReduce code to enable some configuration parameters, as described in HopConfiguration.
Developers interested in the code changes made by HOP should start with HopImplementation. The design changes made by HOP are described at a higher level in the NSDI paper.

Project Information

License: Apache License 2.0
78 stars
svn-based source control

Code

Archive

hop

Getting Started

Project Information