skewreduce


Skew-resistance execution of a distributed job in Hadoop

SkewReduce is a framework to reduce impact of stragglers in a distributed data analysis task. The project provides two components: partition optimizer and runtime. The partition optimizer derives a good partition of input data based on user-supplied cost model of the algorithm and sample input data. The runtime schedules a series of MapReduce job in Hadoop according to the partition plan. For more detail, please refer the paper published in ACM Symposium on Cloud Computing 2010 -- http://portal.acm.org/citation.cfm?id=1807140'>Skew-resistant parallel processing of feature-extracting scientific user-defined functions.

Project Information

The project was created on Mar 11, 2011.

Labels:
hadoop prototype mapreduce skew java fof clustering