-
Notifications
You must be signed in to change notification settings - Fork 109
2.5 Example Code
The MapReduce library comes with an App Engine project that runs two examples using the MapReduce library:
-
RandomCollisions is a single MapReduce job that tests the Java random number generator. It looks for collisions: seed values that produce the same output when
next()
is called the first time. (It does not find any.) -
EntityCount runs three jobs in a row: first a Map job that creates entities in Google Cloud Datastore, next, a MapReduce job that analyzes the entities, and finally, another Map job that deletes the entities.
First compile the project using ant. This command also compiles the MapReduce library and installs its jars in the project:
cd java
ant compile_example
Then run the example in the development server. The dev_appserver
command is located in the App Engine SDK's bin
directory:
dev_appserver.sh example/
To start the map-reduce, point your browser at http://localhost:8080
. You will be asked to login:
Enter an email address, and be sure to check Sign in as Administrator.
You'll see the demo app's landing page:
To run the random collisions example, click on its Run the example link. A form appears:
Fill in the four fields and press Start MapReduce and the job will run.
The top-level example directory is an EAR hierarchy that has a META-INF directory and two WAR directories (default and mapreduce) that define two modules.
This directory contains the file application.xml
which declares the two modules.
The default directory defines the default module, which is the app's frontend. The WEB-INF directory contains the .xml configuration files. Particular features to note are:
-
appengine-web.xml
sets the default module's instance class to F2. -
queue.xml
defines a queue for the MapReduce job. -
web.xml
includes a<security-constraint>
on all URLs, and also defines the top-level servletrandomcollisions
which will start the example CollisionFindingServlet. Note that the servlet runs in the default module, but the MapReduce job that it starts will run in the mapreduce module, because the code that sets up the job callssetModule()
.
The mapreduce directory defines the mapreduce module, which contains the source code for both jobs. Features to note:
-
appengine-web.xml
sets the module's instance class to F4. -
web.xml
contains only the two servlets required to run the MapReduce jobs.
The source code for this example is in this directory.
The static method createMapReduceSpec()
creates a MapReduceSpecification
. Note that it uses Mapper
and Reducer
classes that are defined in other source files. It also uses existing MapReduce classes for the Marshallers and to handle input and output.
The method getSettings()
sets the taskqueue to the queue defined in the queue.xml
file, and the module to the mapreduce module.
The doPost()
method runs a MapReduce job with parameters entered by the user. It creates a MapReduce job and starts it by calling MapReduceJob.start(). The job will run in the mapreduce module.
This class defines the map()
method for the job. It emits a key-value pair where the value is the seed used to generate a random sequence, and the key is the first number in the sequence.
This class defines the reduce()
method for the job. A collision occurs when there is a key with an associated list of values that has more than one seed value for the same random number.
This example links three consecutive jobs together using the pipeline API which is still evolving. We include it here because the first and last job in the pipeline provide an example of how to specify Map jobs.
This file subclasses MapOnlyMapper
, which is used in a Map job to create random
entities in the Datastore.
This file subclasses MapOnlyMapper
, which is used in a Map job to remove entities
from the Datastore. Note that it does not emit any output.
This file creates three jobs, each one is defined by its own job specification.
The specifications are created with the methods getCreationJobSpec()
, getCountJobSpec()
, and getDeleteJobSpec()
.