
skewreduce
SkewReduce is a framework to reduce impact of stragglers in a distributed data analysis task. The project provides two components: partition optimizer and runtime. The partition optimizer derives a good partition of input data based on user-supplied cost model of the algorithm and sample input data. The runtime schedules a series of MapReduce job in Hadoop according to the partition plan. For more detail, please refer the paper published in ACM Symposium on Cloud Computing 2010 -- http://portal.acm.org/citation.cfm?id=1807140'>Skew-resistant parallel processing of feature-extracting scientific user-defined functions.
Project Information
The project was created on Mar 11, 2011.
- License: Apache License 2.0
- 1 stars
- svn-based source control