My favorites | Sign in
Project Home Downloads Wiki Issues Source
READ-ONLY: This project has been archived. For more information see this post.
Search
for
ProgrammingGuideMapperReducer  
SSS Mapreduce Programming Guide - Mapper/Reducer class
ja , en
Updated Feb 27, 2013 by tatsuhik...@gmail.com

Mapper/Reducer class

This chapter explains the part which is not explained by WordCount about definition of Mapper/Reducer.

multi outputs

In SSS Mapreduce, Mapper/Reducer can have multi outputs. If you want to use multi outputs, append Output type argument to tail of arguments of map/reduce method.

The example of Mapper which has two outputs is shown below.

  public class MyMapper extends Mapper {
    public void map(Context context,
        PackableInt key, PackableString value,
        Output<PackableString, PackableInt> output1,
        Output<PackableInt, PackableDouble> output2) throws Exception {
        // The contents of processing 
    }
  }

The type of each output have not be the same. And it is necessary to specify TupleGroup to each output using addOutput method in Job.Builder when Job is created.

      GroupID input = ...;
      GroupID output1 = ...;
      GroupID output2 = ...;

      engine.getJobBuilder("MyMapper", MyMapper.class)
        .addInput(input)
        .addOutput(output1)
        .addOutput(output2) // two output
        .build();

WARNING:

The class used as combiner can not use multioutputs.

The method called before the processing

Mapper/Reducer class have configure method. SSS Mapreduce calls this method before Mapper/Reducer read the tuples. If there is the processing which you want to execute before the execution of Mapper/Reducer, override 'configure' method.

'configure' method of Mapper/Reducer class do nothing. Thus the overrided method have not call 'configure' method of super class.

The only signature is shown below.

  public class MyMapperextends Mapper {
    @Override
    public void configure(Context context) {
        // The contents of processing 
    }

    public void map(Context context,
        PackableInt key, PackableString value,
        Output<PackableString, PackableInt> output1,
        Output<PackableInt, PackableDouble> output2) throws Exception {
        // The contents of processing 
    }
  }

The method called after the processing

You can also define the method called contrary to "configure" method after finishing the processing of all tuples. When a method named "cleanup" is defined in Mapper/Reducer, SSS Mapreduce calls this method after finishing the processing.

"cleanup" method must have the followings arguments.

  1. Mapper/Reducer#Context
  2. Output<Key type of output, value type of output>

As required arguments show, "cleanup" method can output tuples like map/reduce method. Therefore, key and value type of Output must be the same to map/reduce method. And when map/reduce method has multi-outputs, "cleanup" method must have same number of outputs.

The only signature is shown below.

  public class MyMapper extends Mapper {
    public void map(Context context,
        PackableInt key, PackableString value,
        Output<PackableString, PackableInt> output1,
        Output<PackableInt, PackableDouble> output2) throws Exception {
        // The contents of processing 
    }
    public void cleanup(Context context, 
        Output<PackableString, PackableInt> output1,
        Output<PackableInt, PackableDouble> output2) throws Exception {
        // The contents of processing 
    }
  }
Powered by Google Project Hosting