My favorites | Sign in
Project Home Downloads Wiki Issues Source
Project Information
Members
Featured
Wiki pages
Links

Overview

AppEngine-MapReduce is an open-source library for doing MapReduce-style computations on the Google App Engine platform. For an overview of the design, see the appengine documentation.

Status

This is an early experimental release of the MapReduce API. We have both Python and Java versions available under the Source tab.

As of today, It supports input from Datastore Google Cloud Storage and App Engine's Log store

It is implemented using the App Engine GCS client and App Engine Pipelines

You can also use pipelines to chain together MapReduce jobs.

It is also possible to add your own input sources.

Check the WhatsNew page for recent changes in the library.

Getting Started

  • Python users should check out the new demo app documentation for details on how to use the MapReduce API - examples included.
  • Mapper documentation is available for in the Getting Started documents for Python and Java. If you have experience with Hadoop, you may also be interested in our transition guide for Hadoop programmers.

Watch and Learn

Screenshots

Place a nice frontend that analysts or non-programmers can use to run their jobs:

The MapReduce API also comes with a UI that shows how far into each step of your computation you are. In the top screenshot, we're a little way into our MapReduce job, and in the bottom screenshot, we've finished a MapReduce job.

Finally, users can download the results of their MapReduce jobs - here we show the results of our WordCount job, showing how many times each word in our input set shows up:

See the "MapReduce Made Easy" video for a demonstration of the application shown in this screenshots.

Videos

There are a number of videos that discuss how to use the MapReduce API as well as related and underlying technologies:

MapReduce Made Easy

Want to run the app you saw in the screenshots and video above? Go for it! We've revamped the sample Python app that comes with the MapReduce API to make it easier to use - check out the source here and let us know what you think about it!

Get Involved

Like the MapReduce library? Think it could be better? It’s all Apache 2.0 licensed. Check out our Subversion repository and feel free to post patches on the issue tracker.

Features

  • Iterate over line-oriented blob and datastore data out of the box. An extensible framework for adding input readers for your own data formats is included.
  • Automatic sharding for faster execution. Use as many workers as you need to get your results faster.
  • Processing rate limiting. Don’t worry about running over quota. Slow down your mapper and space it out over days. Need your results now? Turn it up all the way and get up to 300 entities/second/worker!
  • Status pages. Always know what jobs you’re running and how they’re doing.
  • Aggregated counters. Keep statistics along the way and do simple rollup reports.
  • Parametrized, reusable mappers. Let non-programmers run their own mapper jobs using parameters and validation that you configure.
  • Batching datastore operation. Automatically batches datastore puts so you don’t have to.

Roadmap

Planned MapReduce library improvements:

  • Efficient, aggregated logging
and much more!

Powered by Google Project Hosting