-
These Ansible playbooks can build a Rackspace Cloud environment and install HDP on it. Follow this link.
-
It can also install HDP on existing Linux devices, be it dedicated devices in a datacenter or VMs running on a hypervizor. Follow this link.
Modify the file at ~/ansible-hadoop/playbooks/group_vars/all
to set the cluster configuration.
The following table will describe the most important variables:
Variable | Description |
---|---|
cluster_name | The name of the HDP cluster |
hdp_version | The HDP major version that should be installed |
admin_password | This is the Ambari admin user password |
services_password | This is a password used by everything else (like hive's database) |
install_* | Set these to true in order to install the respective HDP component |
rax_credentials_file | The location of the Rackspace credentials file as set above |
rax_region | The Rackspace region where the Cloud Servers should be built |
allowed_external_ips | A list of IPs allowed to connect to cluster nodes |
ssh keyfile | The SSH keyfile that will be placed on cluster nodes at build time. |
ssh keyname | The name of the SSH key. Make sure you change this if another key was previously used with the same name. |
The first step is to run the script that will provision the Cloud environment:
cd ~/ansible-hadoop/ && bash provision_rax.sh
Then run the bootstrapping script that will setup the prerequisites on the cluster nodes.
cd ~/ansible-hadoop/ && bash bootstrap_rax.sh
Then run the script that will install Ambari and build the cluster using Ambari Blueprints:
cd ~/ansible-hadoop/ && bash hortonworks_rax.sh
Once you are at this point you can see progress by accessing the Ambari interface.
The Ambari server runs on the last master-node and be accessed on port 8080.
First step is to setup the build node / workstation.
This build node or workstation will run the Ansible code and build the Hadoop cluster (itself can be a Hadoop node).
This node needs to be able to contact the cluster devices via SSH.
All the SSH logins must be known / prepared in advance or alternative SSH public-key authentication can also be used.
The following steps must be followed to install Ansible and the prerequisites on this build node / workstation, depending on its operating system:
- Install required packages:
sudo su -
yum -y install epel-release || yum -y install http://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm
yum repolist; yum install python-virtualenv python-pip python-devel sshpass git vim-enhanced libffi libffi-devel gcc openssl-devel -y
- Create the Python virtualenv and install Ansible:
virtualenv ansible2; source ansible2/bin/activate
pip install ansible==2.1.3.0 pyrax
- Install required packages:
sudo su -
yum -y install epel-release || yum -y install http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
yum repolist; yum install python-virtualenv python-pip python-devel sshpass git vim-enhanced libffi libffi-devel gcc openssl-devel -y
- Create the Python virtualenv and install Ansible:
virtualenv ansible2; source ansible2/bin/activate
pip install ansible==2.1.3.0 pyrax
- Install required packages:
sudo su -
apt-get update; apt-get -y install python-virtualenv python-pip python-dev sshpass git vim libffi libffi-devel gcc openssl-devel
- Create the Python virtualenv and install Ansible:
virtualenv ansible2; source ansible2/bin/activate
pip install ansible==2.1.3.0 pyrax
On the same build node / workstation, run the following:
cd; git clone https://github.com/rackerlabs/ansible-hadoop
There are three types of nodes:
master-nodes
are nodes running the master Hadoop services. You can specify one, two or three nodes.
With two or three master nodes the HDFS NameNode will be configured in HA mode.
slave-nodes
are nodes running the slave Hadoop services and store HDFS data. You can specify 0 or more nodes.
By specifying no slave-nodes, the scripts will deploy a single-node HDP, similar with the Hortonworks sandbox.
edge-nodes
are client only nodes and have only the client libraries installed. These are optional.
Modify the inventory file at ~/ansible-hadoop/inventory/static
to match the desired cluster layout.
-
For each node, set the
ansible_host
to the IP address that is reachable from the build node / workstation. -
Then set
ansible_user=root
andansible_ssh_pass
if the node allows for root user logins. If these are not set, public-key authentication will be used. -
Example for a 1 master node and 3 slave nodes cluster:
[master-nodes] master01 ansible_host=192.168.0.2 ansible_user=root ansible_ssh_pass=changeme [slave-nodes] slave01 ansible_host=192.168.0.3 ansible_user=root ansible_ssh_pass=changeme slave02 ansible_host=192.168.0.4 ansible_user=root ansible_ssh_pass=changeme
-
Example for installing a single-node HDP cluster on the local build node (useful if you want HDP installed on a VirtualBox / VMware VM):
[master-nodes] master01 ansible_host=localhost ansible_connection=local
Modify the file at ~/ansible-hadoop/playbooks/group_vars/master-nodes
to set master-nodes specific information (you can remove all the existing content from this file).
Variable | Description |
---|---|
cluster_interface | Should be set to the network device that the HDP nodes will use to communicate between them. |
hadoop_disk | The disk that should be mounted under /hadoop . The playbook will attempt to partition and format it! Remove this variable if /hadoop should just be a folder on the root filesystem or if the disk has already been partitioned and mounted. |
datanode_disks | Only used for single-nodes clusters. The disks that will be mounted under /grid/{0..n} . Should be set if one or more separate disk devices are used for storing HDFS data. |
-
Example for using the
eth0
cluster interface andsdb
as the disk device for/hadoop
:cluster_interface: 'eth0' hadoop_disk: sdb
-
Example for a single-node cluster with
sdb
mounted under/hadoop
andsdc
,sdd
,sde
used as data drives and mounted under/grid/0
,/grid/1
and/grid/2
:cluster_interface: 'eth0' hadoop_disk: sdb datanode_disks: ['sdc', 'sdd', 'sde']
Modify the file at ~/ansible-hadoop/playbooks/group_vars/slave-nodes
to set slave-nodes specific information (you can remove all the existing content from this file).
Variable | Description |
---|---|
cluster_interface | Should be set to the network device that the HDP nodes will use to communicate between them. |
datanode_disks | The disks that will be mounted under /grid/{0..n} . Should be set if one or more separate disk devices are used for storing HDFS data. Remove this variable if HDFS data should be stored on the root filesystem or if the disks have already been partitioned and mounted under /grid/{0..n} . |
-
Example for using the
eth0
cluster interface andsdb
,sdc
,sdd
as the disk devices that will be mounted under/grid/0
,/grid/1
and/grid/2
:cluster_interface: 'eth0' datanode_disks: ['sdb', 'sdc', 'sdd']
Optionally, if edge nodes are used, modify the file at ~/ansible-hadoop/playbooks/group_vars/edge-nodes
to set edge-nodes specific information (you can remove all the existing content from this file).
The same guidelines as for the master nodes above can be used here.
Modify the file at ~/ansible-hadoop/playbooks/group_vars/all
to set the cluster configuration.
The following table will describe the most important variables:
Variable | Description |
---|---|
cluster_name | The name of the HDP cluster. |
hdp_version | The HDP major version that should be installed. |
admin_password | This is the Ambari admin user password. |
services_password | This is a password used by everything else (like hive's database). |
install_* | Set these to true in order to install the respective HDP component. |
The first step is to run the bootstrapping script that will setup the prerequisites on the cluster nodes.
cd ~/ansible-hadoop/ && bash bootstrap_static.sh
Then run the script that will install Ambari and build the cluster using Ambari Blueprints:
cd ~/ansible-hadoop/ && bash hortonworks_static.sh
Once you are at this point you can see progress by accessing the Ambari interface.
The Ambari server runs on the last master-node and be accessed on port 8080.
Once you are at this point you can see progress by accessing the Ambari interface (the ambari-node will be the last host that ran a play).
The provided Ansible playbook will only open the firewall if you've added your workstation IP to allowed_external_ips variable in the playbooks/group_vars/all file.
Alternatively, you can access Ambari by either opening the firewall manually or by opening a socks proxy with the following command:
ssh -D 12345 root@ambari-node You will need to modify your browser settings to use socks proxy localhost and port 12345.
You'll then be able to navigate to http://ambari-node:8080 in your configured browser and access all subsidiary links.