Experimental!
Mapreduce is an experimental, innovative, and rapidly changing new feature for Google App Engine. Unfortunately, being on the bleeding edge means that we may make backwards-incompatible changes to Mapreduce. We will inform the community when this feature is no longer experimental.
A Pipeline used for Mapreduce jobs.
MapreducePipeline
is provided by the
google.appengine.ext.mapreduce
module.
- Introduction
- Constructor
- Instance methods:
Introduction
The MapreducePipeline class is used to "wire-together" or connect all the steps needed to perform a specific Mapreduce job. It specifies the mapper, reducer, data input reader, output writer and so forth to be used to carry out the job.
Returns filenames from the output writer.
Constructor
- class MapreducePipeline ( job_name , mapper_spec , reducer_spec , input_reader_spec , output_writer_spec = None , mapper_params = None , reducer_params = None, shards = None )
-
The
MapreducePipeline
constructor's arguments fully specify the Mapreduce job.
Arguments
- job_name
- The name of the Mapreduce job. This name shows up in the logs and in the UI.
- mapper_spec
- The name of the mapper used in this mapreduce job. The mapper processes the line by line input from the input reader specified in the input_reader_spec param.
- reducer_spec
- The name of the reducer used in this mapreduce job. The reducer performs work and yields results, using the optional output writer specified in the output_writer_spec param.
- input_reader_spec
- The name of the input reader used in the mapper for this Mapreduce job. The mapper processes the line by line input from the input reader specified.
- output_writer_spec
- The name of the output writer (if any) used to store results from this Mapreduce job.
- mapper_params
- Parameters to use in the input reader.
- reducer_params
- Parameters to use in the output writer.
- shards
- Number of shards to use for this Mapreduce job.
Instance Methods
A Mapreduce instance has the following methods:
- start ()
- Starts the Mapreduce job.