Please note that the contents of this offline web site may be out of date. To access the most recent documentation visit the online version .
Note that links that point to online resources are green in color and will open in a new window.
We would love it if you could give us feedback about this material by filling this form (You have to be online to fill it)



Configuring your Project

Include the MapReduce library in your project

There are three ways to include the MapReduce library in your app. The library is available via the Maven Central repositories , so you can link to it with Maven or Ant/Ivy dependency declarations. You can also download the library source code, compile it, and copy the jars directly into your project.

Using Maven

The MapReduce library is available in the Maven Central repositories. Include the following dependency in your project's pom.xml file:

<dependency>
    <groupId>com.google.appengine.tools</groupId>
    <artifactId>appengine-mapreduce</artifactId>
    <version>RELEASE</version>
</dependency>

Using Ant with Ivy

Add the following dependency to your project's ivy.xml file:

<dependency org="com.google.appengine.tools" name="appengine-mapreduce" rev="latest.integration" />

You might lock the library version down, by specifying rev=[1.0,1.1) for example, but if you do, consider the compatibility issues .

Building the library locally

Normally, you'll link to the MapReduce library with maven or ant. If you need to build the library locally, use subversion to check out the MapReduce source code:

svn checkout http://appengine-mapreduce.googlecode.com/svn/trunk/java

To build the library using Apache Ant , run:

cd java
ant dist

This creates the java/dist directory, which contains all the jars in the MapReduce library. Copy these jars into your application's WEB-INF/lib directory.

Alternatively, to build the library using Apache Maven , run:

cd java
mvn package

Create a module for running MapReduce jobs

MapReduce jobs can run for a long time. We strongly recommend that you run MapReduce in a separate module, preferably one that does not handle user requests. Remember to pass that module's name to MapReduceSettings.setModule() when you create your MapReduceJob.

To create a module, make a WAR directory and include it in your project's EAR directory. The process is described in detail in the modules documentation .

The web.xml file

The WAR directory contains a web.xml file that declares a module's servlets. All MapReduce jobs use two servlets from the MapReduce library. Copy the code below into your module's web.xml file.

<servlet>
  <servlet-name>mapreduce</servlet-name>
  <servlet-class>
    com.google.appengine.tools.mapreduce.MapReduceServlet
  </servlet-class>
</servlet>
<servlet-mapping>
  <servlet-name>mapreduce</servlet-name>
  <url-pattern>/mapreduce/*</url-pattern>
</servlet-mapping>

<servlet>
  <servlet-name>pipeline</servlet-name>
  <servlet-class>
    com.google.appengine.tools.pipeline.impl.servlets.PipelineServlet
  </servlet-class>
</servlet>
<servlet-mapping>
  <servlet-name>pipeline</servlet-name>
  <url-pattern>/_ah/pipeline/*</url-pattern>
</servlet-mapping>

You should also consider adding a security constraint, so only admins can initiate MapReduce jobs:

<security-constraint>
  <web-resource-collection>
    <url-pattern>/*</url-pattern>
  </web-resource-collection>
  <auth-constraint>
    <role-name>admin</role-name>
  </auth-constraint>
</security-constraint>

The appengine-web.xml file

The WAR directory also contains an appengine-web.xml file. In this file, you can specify the module's instance type, which lets you separate the cost and performance of MapReduce from the rest of your application. For very large jobs, the shuffle stage can become memory-bound. With more memory, the shuffle stage uses fewer and larger temporary files for sorting. Using the instance types B4, B4_1G, B8, F4, or F4_1G can significantly improve the performance of the shuffle stage.

The META-INF/application.xml file

You must include the name of the MapReduce module in the EAR's application-xml file. Assuming the WAR directory is named "mapreduce," include this code:

<module>
    <web>
      <web-uri>mapreduce</web-uri>
      <context-root>mapreduce</context-root>
    </web>
</module>

Define your own task queue

A MapReduce job should use a dedicated task queue, rather than relying on sharing your application's default task queue. When you define the <queue> elements in the WEB-INF/queue.xml file, follow these guidelines:

Assigning a Google Cloud Storage bucket

Every MapReduce job uses a Google Cloud Storage bucket , which can be specified in the MapReduceSettings. You may use either the default GCS bucket or one that you create. If you don't specify a bucket, the default bucket is used.

New App Engine apps are automatically assigned a default bucket with a free quota. The name of the bucket has the form: $<app-id>.appspot.com , where <app-id> is your app's ID. If you are adding MapReduce to an existing app, you can check whether a default bucket already exists (and create one if it doesn't) by going to the admin console:

  • Select the Application Settings page from the Administration section in the left-side menu.
  • If a default bucket exists, it will appear in your settings under the title "Google Cloud Storage Bucket."
  • If no default bucket is shown, scroll down the page to the section titled "Cloud Integration." In that section, push the Create button to create a Google Cloud project that will contain a default bucket.

Before you deploy your app to the cloud, you should consider whether or not to enable billing. With billing enabled, you can still use the default bucket, or you can create a special bucket for your app. In either case, you will be billed for storage use above the free quota, up to the amount you specify in your budget. If billing is not enabled, you can only use the default bucket, and in this case, if your bucket use exceeds the free quota, your MapReduce job will fail.

Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.