Spark

Apache spark is a framework for computing in-memory across clusters of compute nodes. Computations have to be adapted to a map/reduce job.

The informatics team has developed Docker containers for bringing up a Spark Cluster and has begun testing various workflows using Spark.