####Table of Contents
- Module Description - What the module does and why it is useful
- Setup - The basics of getting started with pig
- Usage - Configuration options and additional functionality
- Reference - An under-the-hood peek at what the module is doing and how
- Development - Guide for contributing to the module
This module installs Apache Pig - platform for analyzing large data sets. By default pig expects locally set-up Hadoop client.
Supported are:
- Debian 7/wheezy: Cloudera distribution (tested on CDH 5.3.0, Pig 0.12.0)
- Ubuntu 14/trusty: Cloudera distribution (tested on CDH 5.3.0, Pig 0.12.0)
- RHEL 6 and clones: Cloudera distribution (tested on CDH 5.4.2, Pig 0.12.0)
###What cesnet-pig module affects
- Packages: installs pig packages
- Files: files with environment settings
Be aware of:
- Hadoop repositories
- neither Cloudera nor Hortonworks repositories are configured in this module (for Cloudera you can find list and key files here: http://archive.cloudera.com/cdh5/debian/wheezy/amd64/cdh/, ...)
Example:
include pig
By default pig uses Hadoop for its operations, like launched with -x mapreduce:
pig -x mapreduce
Pig can be launched locally this way:
pig -x local
Usage Pig with HBase: add following to the pig scripts (replace <ZooKeeper_version> and <HBase_version> by current values):
register /usr/lib/zookeeper/zookeeper-<ZooKeeper_version>.jar
register /usr/lib/hbase/hbase-<HBase_version>-security.jar
Usage Pig with DataFu: add following to the pig scripts (replace <DataFu_version> by current value):
REGISTER /usr/lib/pig/datafu-<DataFu_version>.jar
pig
: Pig setuppig::config
pig::install
pig::params
###Module Parameters (pig class)
####datafu_enabled
Install also Pig User-Defined Functions collection. Default: false.
Default is false. The package is not available since CDH 6.
- Repository: https://github.com/MetaCenterCloudPuppet/cesnet-pig
- Tests:
- basic: see .travis.yml
- vagrant: https://github.com/MetaCenterCloudPuppet/hadoop-tests
- Email: František Dvořák <[email protected]>