Docker for a single node hadoop installation

⚠️ [Deprecated] I'm not using this package anymore. If someone want to maintain it or to fork it and maintain the fork, can you contact me (@plv on twitter). I can write a link to your package here.

This repository is used to create an hadoop single instance following the documentation on this page : http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html and the doc of spark.

Features :

hadoop / hdfs
yarn
spark
hive
zeppelin

State of the project :

Hadoop, yarn, spark, hive, zeppelin : running, not optimized. I'm interested by any feedback.

Quickstart

clone the project

git clone https://github.com/kibatic/docker-single-node-hadoop.git

create the container

docker-compose build
docker-compose up -d

Zeppelin notebook

You can access to Zeppelin at http://localhost:8002

Run a basic map reduce example

We put some python map reduce examples in the /example dir inside the container

docker exec -ti dockersinglenodehadoop_hsn_1 bash

cd /example
hdfs dfs -mkdir /input
hdfs dfs -put fichier.txt /input
hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar -input /input -output /output -mapper /example/mapper.py -reducer /example/reducer.py
hdfs dfs -cat /output/part-00000

run the same basic map reduce with spark

docker exec -ti dockersinglenodehadoop_hsn_1 bash

cd /example
hdfs dfs -mkdir /input
hdfs dfs -put fichier.txt /input

# run pyspark
pyspark

# load file
file = sc.textFile("/input/fichier.txt")
file.collect()

# mapping
def split_words(line):
    return line.split()
def create_pair(word):
    return (word,1)
pairs=file.flatMap(split_words).map(create_pair)

# reducing
def sum_counts(a,b):
    return a+b
wordcount = pairs.reduceByKey(sum_counts)

# display result
wordcount.collect()

Features

Lancer, arrêter le container

docker-compose start
docker-compose stop

volume /data in the ./data_docker directory

This directory is a shared volume with the /data of the container.

In this directory we have :

/data/hdfs for hdfs files
/data/yarn for yarn files
/data/transfert just for easy tranfert between the host and the container.

Run container manually from dockerfile

Lancer le container et entrer dedans

docker build -t hsn .
docker run --rm --name hsn hsn
# entrer dans le docker
docker exec -ti hsn bash

Arrêter le container

docker stop hsn

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
config		config
docker		docker
example		example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Docker for a single node hadoop installation

Quickstart

clone the project

create the container

Zeppelin notebook

Run a basic map reduce example

run the same basic map reduce with spark

Features

Lancer, arrêter le container

volume /data in the ./data_docker directory

Run container manually from dockerfile

About

Releases

Packages

Contributors 3

Languages

kibatic/docker-single-node-hadoop

Folders and files

Latest commit

History

Repository files navigation

Docker for a single node hadoop installation

Quickstart

clone the project

create the container

Zeppelin notebook

Run a basic map reduce example

run the same basic map reduce with spark

Features

Lancer, arrêter le container

volume /data in the ./data_docker directory

Run container manually from dockerfile

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages