Hadoop-Benchmark is an open-source research acceleration platform for rapid prototyping and evaluation of self-adaptive behaviors in Hadoop clusters. The main objectives are to allow researchers to
− rapidly prototype, i.e., to experiment with self-adaptation in Hadoop clusters without the need to cope with low-level system infrastructure details,
− reproduction, i.e., to share complete experiments for others to reproduce them independently, and
− repetition, i.e., to experiment with and to compare their work, re-doing the same experiments on the same system using the same evaluation methods.
It uses docker and docker-machine to easily create a multi-node cluster (on a single laptop or in a cloud including Grid5000) and provision Hadoop. It contains a number of acknowledged benchmarks and one self-adaptive scenario.
The following is the high-level overview of the created cluster and deployed services:
- docker >= 1.12
- docker-machine >= 0.8
- (optional) R >= 3.3.2 with tidyverse and Hmisc for data analysis
./cluster.sh ✓
Usage ./cluster.sh [OPTIONS] COMMAND
Options:
-f, --force Use '-f' in docker commands where applicable
-n, --noop Only shows which commands would be executed wihout actually executing them
-q, --quiet Do not print which commands are executed
Commands:
Cluster:
create-cluster
start-cluster
stop-cluster
restart-cluster
destroy-cluster
status-cluster
Hadoop:
start-hadoop
stop-hadoop
restart-hadoop
destroy-hadoop
Misc:
console Enter a bash console in a container connected to the cluster
run-controller CMD Run a command CMD in the controller container
hdfs CMD Run the HDFS CMD command
hdfs-download SRC Download a file from HDFS SRC to current directory
Info:
shell-init Shows information how to initialize current shell to connect to the cluster
Useful to execute like: 'eval $(./cluster.sh shell-init)'
connect-info Shows information how to connect to the cluster
-
check the tutorial to get started.
-
check the screencast
-
check the demonstration of using hadoop-benchmark on Grid5000