My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members
Featured
Downloads

SkewReduce is a framework to reduce impact of stragglers in a distributed data analysis task. The project provides two components: partition optimizer and runtime. The partition optimizer derives a good partition of input data based on user-supplied cost model of the algorithm and sample input data. The runtime schedules a series of MapReduce job in Hadoop according to the partition plan. For more detail, please refer the paper published in ACM Symposium on Cloud Computing 2010 -- Skew-resistant parallel processing of feature-extracting scientific user-defined functions.

Powered by Google Project Hosting