My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
GettingStartedInJava  
Getting started guide for Java mapper library
Featured
Updated May 23, 2011 by f...@google.com

Adding the MapReduce Library To Your Application

Check out the mapreduce folder to a separate directory:

svn checkout http://appengine-mapreduce.googlecode.com/svn/trunk/java

Build the appropriate jar using ant in the directory you just checked out:

ant

Copy the resulting jars in the dist/lib directory into your application's WEB-INF/lib directory. If you're already using any of the dependency jars, there's no need to have duplicates.

Add the mapreduce handler to your web.xml:

<servlet>
  <servlet-name>mapreduce</servlet-name>
  <servlet-class>com.google.appengine.tools.mapreduce.MapReduceServlet</servlet-class>
</servlet>
<servlet-mapping>
  <servlet-name>mapreduce</servlet-name>
  <url-pattern>/mapreduce/*</url-pattern>
</servlet-mapping>

You may also want to add a security constraint to make sure that only application administrators can view/run mappers:

<security-constraint>
  <web-resource-collection>
    <url-pattern>/mapreduce/*</url-pattern>
  </web-resource-collection>
  <auth-constraint>
    <role-name>admin</role-name>
  </auth-constraint>
</security-constraint>

Defining a Mapper

Create a class implementing AppEngineMapper. You can see an example of such a class here.

There are two ways to configure a mapper. You can either programmatically create a Configuration as seen here, or you can define a template using mapreduce.xml as seen here. There is a description of the mapreduce.xml format in the javadoc for the ConfigurationTemplatePreprocessor class.

Running the Mapper

If you configured your mapper using the configuration template approach, then you can start the mapper by navigating your browser to http://<your_app_id>.appspot.com/mapreduce/status. Click the launch button to start one of the registered mapreduces, and then go to the mapreduce detail page to observe its status and control its execution.

If you used the programmatic approach, then just run whichever handler you added the creation code to. The mapper will show up on the status page (linked above) just as if you had run it using a template.

Further Reading

You can get more information about optional parameters, batch datastore mutations, and more at the Java User Guide

Comment by crll...@gmail.com, Aug 7, 2010

The project checked out via SVN builds 5 jar files, including "json.jar" and "commons-logging-1.1.1.jar". I am a little bit confused, as : 1. The default AppEngine? SDK includes a "repackaged-appengine-commons-logging-1.1.1.jar" 2. The default jars also include a repackaged JSON library

Which one to import in the user code ?

Comment by lhori...@gmail.com, Aug 19, 2010

How do we kick off a mapper from cron? The example of programmatically creating a Configuration seems totally useless - it just shows how to make a web form that starts the job. We need some way to kick off the job from code!

Comment by branflak...@gmail.com, Sep 3, 2010

For those wondering about the dependencies (jars); they are either downloaded or are included in the source download from svn and built with ant.

Comment by branflak...@gmail.com, Sep 6, 2010

http://demofileuploadgae.appspot.com/ - My Appspot Mapper Reduce, Blobstore Demo

Java Import Mapper Code: Source

Comment by alex@bedatadriven.com, Nov 9, 2010

@lhoriman You can use the TaskQueue? API to start jobs programatically, i.e. in response to someother user action or via a cron job

	/**
	 * Starts a map/reduce job that updates {@link Subscriber} entities with 
	 * dates from a given status snapshot.
	 */
	public static void updateSubscriberDatesFromSnapshot(UsageBlob statusBlob) {		
		
		TaskOptions task = buildStartJob("Update Subscriber Dates");
		addJobParam(task, StatusBlobInputFormat.SAMPLE_ID_PARAMETER, statusBlob.getSampleId());
		addJobParam(task, StatusBlobInputFormat.DATE_PARAMETER, statusBlob.getDate().getMillis());
		
		Queue queue = QueueFactory.getDefaultQueue();
		queue.add(task);
	}


	private static TaskOptions buildStartJob(String jobName) {
		return TaskOptions.Builder
		.url("/mapreduce/command/start_job")
		.method(Method.POST)
		.header("X-Requested-With", "XMLHttpRequest") // hack: we need to fix appengine-mapper so we can properly call start_job without need to pretend to be an ajaxmethod
		.param("name", jobName);
	}
	
	private static void addJobParam(TaskOptions task, String paramName, String paramValue ) {
		task.param("mapper_params." + paramName, paramValue);
	}
	
	private static void addJobParam(TaskOptions task, String paramName, long value) {
		addJobParam(task, paramName, Long.toString(value));
	}
Comment by luigi.agosti, Nov 29, 2010

this is great... nice work! one thing ... have you consider to move to maven as a build tool? I may even give you some help if you want thanks Luigi

Comment by GoSharpL...@gmail.com, Jan 20, 2011

Hello, Is "Defining the Descending Key Index" still necessary? App Engine gives error log as shown below. This index is not necessary, since single-property indices are built in. Please remove it from your index file and upgrade to the latest version of the SDK, if you haven't already.

Comment by antony.trupe, Feb 8, 2011

No, it is not necessary.

Comment by f...@google.com, Feb 27, 2011

Yeah, it's no longer necessary. Just removed it from the documentation.

Comment by tero.nur...@gmail.com, Mar 5, 2011

Another approach to auto-start MapReduce? with Blobs. No need to create mapreduce.xml file anymore :)

public static void StartMapReduceWithBlob(Class<? extends AppEngineMapper<BlobstoreRecordKey, byte[], NullWritable, NullWritable>> blobMapperImpl, String blobKey) {
	Configuration conf = new Configuration(false);
	conf.setClass("mapreduce.map.class", blobMapperImpl, Mapper.class);
	conf.setClass("mapreduce.inputformat.class", BlobstoreInputFormat.class, InputFormat.class);
	conf.set(BlobstoreInputFormat.BLOB_KEYS, blobKey);
    
	String xml = ConfigurationXmlUtil.convertConfigurationToXml(conf);
        
	Queue queue = QueueFactory.getDefaultQueue();
	TaskOptions task = TaskOptions.Builder
			.withDefaults()
			.url("/mapreduce/start")
			.method(Method.POST)
			.param("configuration", xml);
        queue.add(task);
}
Comment by tero.nur...@gmail.com, Mar 6, 2011

Actually mapreduce.xml file is not needed in sandbox, but AppEngine? raises an error if it does not exist. The file can contain only a root element <configurations></configurations> in order to work.

Comment by mabidsha...@gmail.com, Jun 27, 2011

private static TaskOptions? buildStartJob(String jobName) {

return TaskOptions?.Builder .url("/mapreduce/command/start_job") .method(Method.POST) .header("X-Requested-With", "XMLHttpRequest") // hack: we need to fix appengine-mapper so we can properly call start_job without need to pretend to be an ajaxmethod .param("name", jobName);

How do we get parameter "name" in map() function of map reduce class? Thanks

Comment by a.allahb...@gmail.com, Jul 25, 2011

I have built on my PC. Please download the dist folder from http://goo.gl/RQ9ln


Sign in to add a comment
Powered by Google Project Hosting