Skip to content

1.1 MapReduce and AppEngine

markgoldstein edited this page Oct 24, 2014 · 1 revision

The MapReduce library is available in two versions: one for Java and one for Python. The functionality of each is slightly different.

Both libraries are built on top of App Engine services, including Datastore and Task Queues. You must download the MapReduce library and include it with your application. The library provides:

  • A programming model for large-scale distributed data processing
  • Automatic parallelization and distribution within your existing codebase
  • Access to Google-scale data storage
  • I/O scheduling
  • Fault-tolerance, handling of exceptions
  • User-tunable settings to optimize for speed/cost
  • Tools for monitoring status

There are no usage charges associated with the MapReduce library. As with any App Engine application, you are charged for any App Engine resources that the library or your MapReduce code consumes (beyond the free quotas) while running your job. These can include instance hours, Datastore and Google Cloud Storage usage, network, and other storage.