Skip to content
Sri Harsha Boda edited this page Sep 15, 2017 · 1 revision

Step-by-step guide

1.Install MySQL version 5.5 or higher. If you are running Windows Operating System, refer to the documentation on how to install MySQL at the following link: http://dev.mysql.com/doc/refman/5.6/en/windows-installation.html. If you are running Unix based Operating System, refer to the documentation on how to install MySQL at the following link: http://dev.mysql.com/doc/refman/5.1/en/binary-installation.html

2.Install Git version 1.9.4 or higher. If you are running Windows Operating System, Refer to the documentation on how to install Git at the following link: http://git-scm.com/book/en/v2/Getting-Started-Installing-Git#Installing-on-Windows If you are running Unix based Operating System, Refer to the documentation on how to install Git at the following link: http://git-scm.com/book/en/v2/Getting-Started-Installing-Git#Installing-on-Linux Open Git Bash. From Git Bash do following. Use your own name and email.

  • git config --global user.name “Your Name” <—This is your gitlab username name like “bdrearijit"

  • git config --global user.email “[email protected]"

  • cd to a folder where you want your project to be created under. Say C:\wipro\bdre. It is assumed that the project directory is C:\wipro\bdre for all future reference.

  • git clone https:// <yourid >@gitlab.com/bdre/metadata_management.git

  • git clone https:// <yourid >@gitlab.com/bdre//im_framework.git

  • git clone https:// <yourid >@gitlab.com/bdre/applications.git

3.Download a stable version of Apache Maven and install it. If you are running Windows Operating System, download the Binary zip at the following link: http://maven.apache.org/download.cgi?Preferred=ftp://mirror.reverse.net/pub/apache/ If you are running Unix based Operating System, download the Binary tar.gz at the following link: http://maven.apache.org/download.cgi?Preferred=ftp://mirror.reverse.net/pub/apache/

4.Add the installation path of Maven to ‘Path’ in system environment variables following the procedure provided in the above link under Installation instructions.

5.Once you have the project downloaded, the code can be written or modified in IDEs like Eclipse or IntelliJ IDEA. We can also integrate Git in the IDEs. Refer to the corresponding vendor documentation on how to install the respective IDEs and integrate Git into them.

6.Start by creating a database of your choice in MySQL. Then create the tables by running the script present at the following path:

sh C:\wipro\bdre\metadata_mgmt\mysql\scripts\create-tables.sh <database_user> <database_password> <database_name> <databse_server_hostname> <database_server_port>

7.Populate the tables using SQL queries present at the following location :

C:\wipro\bdre\metadata_mgmt\example\metadata-setup\md.sql

8.Then populate the database with the stored procedures by running the script present at the following path:

sh C:\wipro\bdre\metadata_mgmt\mysql\scripts\create-procs.sh <database_user> <database_password> <database_name> <databse_server_hostname> <database_server_port>

9.If you are using Windows Operating System, then set in mybatis-config.xml at the following pathmetadata_mgmt\md_commons\src\main\resources\mybatis-config.xml. Set the value of database name as per the name of the database you created in the following statement.

<property name="url" value="jdbc:mysql:// <mysql_host_name> : <mysql_port>/ <database_name>"/ >.

Else, if you are using a UNIX-based Operating System, first create a user called “mduser” and set a password. (Grant necessary privileges for the instance running Hadoop to access the instance running the MySQL server if necessary).

Then set <environments default="development"> Set the value of database name as per the name of the database you created in the following statement.

<property name="url" value="jdbc:mysql:// <mysql_host_name >:3306/ <database_name>"/>

10.Create or edit a file with name ENVIRONMENT at the following loaction:
metadata_management -> im_commons -> src -> main -> resources
and enter your environment name as follows :
environment=<your_environment_name>

11.Follow the below mentioned procedure to run a sample project with a File Ingestion module, ETLDriver module and Semantic module. To run the sample project, you must be running Hadoop(HDFS and Map Reduce). Also install Hive and Oozie.

Install Hive Version 0.13.1 from http://www.cloudera.com/content/cloudera/en/documentation/core/v5-2-x/topics/cdh_ig_hive_installation.html . Make necessary configurations for Hive. Install Oozie from http://www.cloudera.com/content/cloudera/en/documentation/cdh4/latest/CDH4-Installation-Guide/cdh4ig_topic_17_5.html

Alternatively, you could use quickstart virtual machines for Hadoop provided by Cloudera, Hortonworks etc, which come preloaded with all major tools like Hive, Oozie etc. Refer to the corresponding vendor documentation on how to install the respective quickstart virtual machines.

12.Firstly, Set the initial metadata in the database by running a script using the following command. This command basically takes care of inserting some data into the database like information about jobs, sub-steps, batches, files etc. based on which all the modules can start working on :

mysql -h <mysql_host_name> -P <port>- -u <username> --password =<your_password> <database_name> < /home/dropuser/BDRE/example/metadata-setup/md-dev.sql

13.Create a user called dropuser in Hadoop. Run the following commands to import the jars from our repository into corresponding library directories of HDFS.

hadoop fs -mkdir -p /user/oozie/bdre/lib

hadoop fs -rm /user/oozie/bdre/lib/*

hadoop fs -put BDRE/target/lib/ /user/oozie/bdre/lib*

14.Refer to the Doc Workflow Generation. Use section Workflow Generation Setup. Follow the steps mentioned there.

15.Start with the File Ingestion module. This pulls the files we need into the HDFS. Here we are ingesting two files using two processes. After the run, we can see newly added files in the HDFS. Run the module using the following commands.

java -cp /home/dropuser/BDRE/target/lib/ com.wipro.ats.bdre.im.etl.api.sftp.SFTP2HDFSMain -env development -pid 185 -spid 186 -hsid 2*

java -cp /home/dropuser/BDRE/target/lib/ com.wipro.ats.bdre.im.etl.api.sftp.SFTP2HDFSMain -env development -pid 195 -spid 196 -hsid 2*

16.Once the files are ingested, run the ETLDRiver module, which populates the core tables and adds partitions. It can be started as follows:

oozie job -run -config /home/dropuser/BDRE/example/etl/workflow/job-dev.properties -oozie http://ip-10-0-0-214:11000/oozie

17.Once the core tables are populated, semantic module can be started using the following command.This module basically performs some operations on the core tables populated using ETLDriver to generate the reporting table.

oozie job -run -config /home/dropuser/BDRE/example/hive/workflow/job-dev.properties -oozie http://ip-10-0-0-214:11000/oozie

Clone this wiki locally