My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
UserGuideJava  
User guide for the Java mapper library.
Updated Feb 27, 2011 by f...@google.com

Configuration Parameters

You can set the following in your configuration to customize mapper behavior:

key default value explanation
mapreduce.mapper.inputprocessingrate 1000 The aggregate number of entities processed per second by all mappers. Used to prevent large amounts of quota being used up in a short time period.
mapreduce.mapper.shardcount 8 The number of concurrent workers to use. This also determines the number of shards to split the input into.
mapreduce.appengine.donecallback.url None A url in the format accepted by the task queue constructor to call after the mapper job is completed. The given url is sent a POST request with a paremeter of job_id, set to the completed MR's job ID.

There are additional configuration options defined in the AppEngineJobContext class.

Additionally, each input format has its own configuration options:

There is currently only one input format: DatastoreInputFormat. Its options are:

key default value explanation
mapreduce.mapper.inputformat.datastoreinputformat.entitykind None The datastore entity to map over.

Batch Datastore Mutations

You can use the DatastoreMutationPool class to batch datastore puts or deletes. A DatastoreMutationPool with the default thresholds is available from the AppEngineMapper.AppEngineContext class, which is accessible using the AppEngineMapper.getAppEngineContext() method.

Current Java Limitations

The following limitations apply to the current implementation. We're working to remove all of them:

  • Only full range scan is supported, i.e. it's impossible to scan some entity subset.

Comment by branflak...@gmail.com, Sep 6, 2010

Add your own mapper parameters for use during the operation:

This CSV import demo is part of: http://demofileuploadgae.appspot.com/

Set the configuration like: Source

<property>
  <name human="CSV Delimiter [,|~|\t|\|]">delimiter</name>
  <value template="optional">,</value>      
</property>

<property>
  <name human="Skip Row 1: [0|1]">skipRow1</name>
  <value template="optional">1</value>      
</property>

Read the configuration like: Source

String delimiter = context.getConfiguration().get("delimiter");
String skipRow1 = context.getConfiguration().get("skipRow1");
Comment by j...@highvolumeseller.com, Apr 7, 2011

Thanks branflak, exactly what I was looking for!

Comment by susheel....@gmail.com, Aug 23, 2011

I downloaded your code and tried to run but its displaying

Unable to load server class 'org.gonevertical.upload.DemoUpload?' java.lang.ClassNotFoundException?: org.gonevertical.upload.DemoUpload?

at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController?.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at java.lang.ClassLoader?.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader?.loadClass(Unknown Source) at java.lang.ClassLoader?.loadClass(Unknown Source) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Unknown Source) at com.google.gwt.dev.DevMode?$ArgHandlerServer?.setString(DevMode?.java:117) at com.google.gwt.util.tools.ArgHandlerString?.handle(ArgHandlerString?.java:26) at com.google.gwt.util.tools.ToolBase?.processArgs(ToolBase?.java:238) at com.google.gwt.dev.ArgProcessorBase?.processArgs(ArgProcessorBase?.java:29) at com.google.gwt.dev.DevMode?.main(DevMode?.java:308)
Google Web Toolkit 2.3.0 DevMode? [-noserver] [-port port-number | "auto"] [-whitelist whitelist-string] [-blacklist blacklist-string] [-logdir directory] [-logLevel level] [-gen dir] [-bindAddress host-name-or-address] [-codeServerPort port-number | "auto"] [-server servletContainerLauncher[:args]] [-startupUrl url] [-war dir] [-deploy dir] [-extra dir] [-workDir dir] modules?

where

-noserver Prevents the embedded web server from running -port Specifies the TCP port for the embedded web server (defaults to 8888) -whitelist Allows the user to browse URLs that match the specified regexes (comma or space separated) -blacklist Prevents the user browsing URLs that match the specified regexes (comma or space separated) -logdir Logs to a file in the given directory, as well as graphically -logLevel The level of logging detail: ERROR, WARN, INFO, TRACE, DEBUG, SPAM, or ALL -gen Debugging: causes normally-transient generated types to be saved in the specified directory -bindAddress Specifies the bind address for the code server and web server (defaults to 127.0.0.1) -codeServerPort Specifies the TCP port for the code server (defaults to 9997) -server Specify a different embedded web server to run (must implement ServletContainerLauncher?) -startupUrl Automatically launches the specified URL -war The directory into which deployable output files will be written (defaults to 'war') -deploy The directory into which deployable but not servable output files will be written (defaults to 'WEB-INF/deploy' under the -war directory/jar, and may be the same as the -extra directory/jar) -extra The directory into which extra files, not intended for deployment, will be written -workDir The compiler's working directory for internal use (must be writeable; defaults to a system temp dir)
and
modules? Specifies the name(s) of the module(s) to host


Sign in to add a comment
Powered by Google Project Hosting