hops-yarn-ML

This project collects useful metrics from InfluxDB coming from Resource Manager, Node Manager, Spark through Graphite and Telegraf and also from MySQL cluster and then processes the data to feed into Tensorflow to build a fully connected Feedforward Neural Network for predicting memory and CPU utilization for applications at the container level.

The data processing, cleansing and aggregation is done using pyspark and machine learning model is built on Tensorflow. The data is read in batches from the databases and offset is used to keep track of the subsequent batches till it reads all the data.

The credentials information of databases if read from a congif.txt file

Input parameters:

Start timestamp: time1
End timestamp: time2
Time is in seconds but we need to provide in the following format e.g 1501758105000000000

How to use:
/srv/hops/spark-2.1.0-bin-without-hadoop/bin/spark-submit yarn_machine_learning.py

Note:
At the moment it is designed to predict one label that is 'PCpuUsagePercentAvgPercents', average cpu utilization in percentage.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.txt		config.txt
yarn_machine_learning.py		yarn_machine_learning.py
yarn_machine_learning_test.ipynb		yarn_machine_learning_test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hops-yarn-ML

About

Releases

Packages

Languages

License

hopshadoop/hops-yarn-ML

Folders and files

Latest commit

History

Repository files navigation

hops-yarn-ML

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages