|
Project Information
Featured
Downloads
|
SkewReduce is a framework to reduce impact of stragglers in a distributed data analysis task. The project provides two components: partition optimizer and runtime. The partition optimizer derives a good partition of input data based on user-supplied cost model of the algorithm and sample input data. The runtime schedules a series of MapReduce job in Hadoop according to the partition plan. For more detail, please refer the paper published in ACM Symposium on Cloud Computing 2010 -- Skew-resistant parallel processing of feature-extracting scientific user-defined functions. |