My favorites | Sign in
Project Home Downloads Wiki Issues Source
READ-ONLY: This project has been archived. For more information see this post.
Search
for
ProgrammingGuideMergeMapper  
SSS Mapreduce Programming Guide - MergeMapper
ja , en
Updated Feb 27, 2013 by tatsuhik...@gmail.com

MergeMapper

SSS Mapreduce has the function for processing of the combination of all the tuples in two TupleGroup. This function is called "MergeMapper".

First, define Mapper which has two inputs for the processing.

public class MergeMapper extends Mapper {
 public void map(Context context,
                 PackableString key1, PackableInt value1,
                 PackableString key2, PackableInt value2,
                 Output<PackableString, PackableInt> output) {
     // The contents of processing 
 }
}

Next, When Job is created with JobEngine, specify two inputs using two calls of addInput method.

    JobEngine engine = new JobEngine(client);
    try {
      GroupID input1 = ...;
      GroupID input2 = ...;
      GroupID output = GroupID.createRandom(engine);

      engine.getJobBuilder("MergeMapper", MergeMapper.class)
        .addInput(input1)
        .addInput(input2)
        .addOutput(output).build();

Then one caution is necessary. "broadcast" flag must be valid in the TupleGroup of second input. You can create the TupleGroup which has the broadcast flag by specifying true to broadcast argument of GroupID.createRandom.

    GroupID gid = GroupID.createRandom(client, true);
    DataPutter<PackableInt, PackableString> putter =
      DataPutter.create(client, PackableInt.class, PackableString.class, gid); 
    // You can specify !TupleGroup which will be written to data to fourth argument of DataPutter.create.
      :
      :

And MergeMapper reads all tuples in second input and store in memory, so MergeMapper cannot process in case that this TupleGroup is too large to be stored in memory.

Powered by Google Project Hosting