My favorites | Sign in
Project Home Downloads Wiki Issues Source
READ-ONLY: This project has been archived. For more information see this post.
Search
for
ProgrammingGuidePartitioner  
SSS Mapreduce Programming Guide - Partitioner
ja , en
Updated Feb 27, 2013

Partitioner

"Partitioner" is the class which calculates hash value from a key and decides the node where the tuple is written.

By default, HashPartitioner class is used.

But HashPartitioner may not distribute the tuples equally, depending on the used data set. there is also the thing which HashPartitioner does not distribute the tuples equally by the used data set. In this case, you need to create Partitioner to distribute the tuples equally.

The definition of Interface Partitioner is shown below.

  public interface Partitioner {
    int getPartition(Packable key, Packable value, int numPartitions);
  }

Partitioner#getPartition method must create an integer value from the key. This integer value must be same when value of keys is same. SSS Mapreduce distributes the tuples using this integer values.

In order to specify Partitioner to DataPutter, pass the Partitioner class object to the argument of DataPutter.create method.

    DataPutter<PackableString, PackableInt> putter = DataPutter.create(PackableString.class, PackableInt.class, MyPartitioner.class);

In order to specify Partitioner to the output of Job, wrap the Partitioner class object with Job.Partitioner method and pass it to Job.Builder#addOutput.

    JobEngine engine = new JobEngine(client);
    try {
      engine.getJobBuilder("MyReducer", MyReducer.class)
        .addInput(input)
        .addOutput(output, Job.partitioner(MyPartitioner.class)).build();
Powered by Google Project Hosting