|
GettingStartedInJava
Getting started guide for Java mapper library
Featured Adding the MapReduce Library To Your ApplicationCheck out the mapreduce folder to a separate directory: svn checkout http://appengine-mapreduce.googlecode.com/svn/trunk/java Build the appropriate jar using ant in the directory you just checked out: ant Copy the resulting jars in the dist/lib directory into your application's WEB-INF/lib directory. If you're already using any of the dependency jars, there's no need to have duplicates. Add the mapreduce handler to your web.xml: <servlet> <servlet-name>mapreduce</servlet-name> <servlet-class>com.google.appengine.tools.mapreduce.MapReduceServlet</servlet-class> </servlet> <servlet-mapping> <servlet-name>mapreduce</servlet-name> <url-pattern>/mapreduce/*</url-pattern> </servlet-mapping> You may also want to add a security constraint to make sure that only application administrators can view/run mappers: <security-constraint>
<web-resource-collection>
<url-pattern>/mapreduce/*</url-pattern>
</web-resource-collection>
<auth-constraint>
<role-name>admin</role-name>
</auth-constraint>
</security-constraint>Defining a MapperCreate a class implementing AppEngineMapper. You can see an example of such a class here. There are two ways to configure a mapper. You can either programmatically create a Configuration as seen here, or you can define a template using mapreduce.xml as seen here. There is a description of the mapreduce.xml format in the javadoc for the ConfigurationTemplatePreprocessor class. Running the MapperIf you configured your mapper using the configuration template approach, then you can start the mapper by navigating your browser to http://<your_app_id>.appspot.com/mapreduce/status. Click the launch button to start one of the registered mapreduces, and then go to the mapreduce detail page to observe its status and control its execution. If you used the programmatic approach, then just run whichever handler you added the creation code to. The mapper will show up on the status page (linked above) just as if you had run it using a template. Further ReadingYou can get more information about optional parameters, batch datastore mutations, and more at the Java User Guide |
The project checked out via SVN builds 5 jar files, including "json.jar" and "commons-logging-1.1.1.jar". I am a little bit confused, as : 1. The default AppEngine? SDK includes a "repackaged-appengine-commons-logging-1.1.1.jar" 2. The default jars also include a repackaged JSON library
Which one to import in the user code ?
How do we kick off a mapper from cron? The example of programmatically creating a Configuration seems totally useless - it just shows how to make a web form that starts the job. We need some way to kick off the job from code!
For those wondering about the dependencies (jars); they are either downloaded or are included in the source download from svn and built with ant.
http://demofileuploadgae.appspot.com/ - My Appspot Mapper Reduce, Blobstore Demo
Java Import Mapper Code: Source
@lhoriman You can use the TaskQueue? API to start jobs programatically, i.e. in response to someother user action or via a cron job
/** * Starts a map/reduce job that updates {@link Subscriber} entities with * dates from a given status snapshot. */ public static void updateSubscriberDatesFromSnapshot(UsageBlob statusBlob) { TaskOptions task = buildStartJob("Update Subscriber Dates"); addJobParam(task, StatusBlobInputFormat.SAMPLE_ID_PARAMETER, statusBlob.getSampleId()); addJobParam(task, StatusBlobInputFormat.DATE_PARAMETER, statusBlob.getDate().getMillis()); Queue queue = QueueFactory.getDefaultQueue(); queue.add(task); } private static TaskOptions buildStartJob(String jobName) { return TaskOptions.Builder .url("/mapreduce/command/start_job") .method(Method.POST) .header("X-Requested-With", "XMLHttpRequest") // hack: we need to fix appengine-mapper so we can properly call start_job without need to pretend to be an ajaxmethod .param("name", jobName); } private static void addJobParam(TaskOptions task, String paramName, String paramValue ) { task.param("mapper_params." + paramName, paramValue); } private static void addJobParam(TaskOptions task, String paramName, long value) { addJobParam(task, paramName, Long.toString(value)); }this is great... nice work! one thing ... have you consider to move to maven as a build tool? I may even give you some help if you want thanks Luigi
Hello, Is "Defining the Descending Key Index" still necessary? App Engine gives error log as shown below. This index is not necessary, since single-property indices are built in. Please remove it from your index file and upgrade to the latest version of the SDK, if you haven't already.
No, it is not necessary.
Yeah, it's no longer necessary. Just removed it from the documentation.
Another approach to auto-start MapReduce? with Blobs. No need to create mapreduce.xml file anymore :)
public static void StartMapReduceWithBlob(Class<? extends AppEngineMapper<BlobstoreRecordKey, byte[], NullWritable, NullWritable>> blobMapperImpl, String blobKey) { Configuration conf = new Configuration(false); conf.setClass("mapreduce.map.class", blobMapperImpl, Mapper.class); conf.setClass("mapreduce.inputformat.class", BlobstoreInputFormat.class, InputFormat.class); conf.set(BlobstoreInputFormat.BLOB_KEYS, blobKey); String xml = ConfigurationXmlUtil.convertConfigurationToXml(conf); Queue queue = QueueFactory.getDefaultQueue(); TaskOptions task = TaskOptions.Builder .withDefaults() .url("/mapreduce/start") .method(Method.POST) .param("configuration", xml); queue.add(task); }Actually mapreduce.xml file is not needed in sandbox, but AppEngine? raises an error if it does not exist. The file can contain only a root element <configurations></configurations> in order to work.
private static TaskOptions? buildStartJob(String jobName) {
How do we get parameter "name" in map() function of map reduce class? Thanks
I have built on my PC. Please download the dist folder from http://goo.gl/RQ9ln