This is the repository for the 'Predicting the Resources of Network Simulations' project. This project is developed as a Student Master Project by the department of sustainable communication networks, University of Bremen, Germany.
The OOTB platform uses containerization techniques to run several simulations in parallel where each simulation takes certain amount of available resources and time to complete. It is important to demarcate or halt simulations that take longer durations and more resources to complete than usual as they hinder performing parallel simulations.
This project provides a solution to overcome this problem by training a Machine learning model and using it to predict the resources prior to the simulation and give an estimate to the user about the network resource utilization.
Install
This project requires Python and the following Python libraries installed:
You will also need to have software installed to run and execute a Jupyter Notebook. If you do not have Python installed yet, it is highly recommended that you run and execute on Google Colab, which already has the above packages and more included.
Data
The data used for building the model in this project is obtained by performing simulations on OOTB tool. Once the required number of simulations are performed the user can also get these simulation results from the ‘Collect Simulation Data’ tab in the OOTB in JSON format. And the required data related to the simulation parameters can be extracted using the JSONExtraction.ipynb
The dataset used for the training the models in this project is available in the root/Dataset folder
Features
- Number of Nodes
- Data Generation Interval
- Data Size in bytes
- Constraint Area X
- Constraint Area Y
- Locations
- Hosts
- Maximum Cache Size
- Forwarding Layer
- Application Layer
Targets
- Peak Disk Usage
- Peak RAM Used in Simulation
- Peak RAM Used in Results parsing
- Total Job Time Taken
Models
Random Forest Regression Random Forest is an ensemble learning technique that involves creation of several individual decision trees in parallel and averaging the results of each tree which eventually results in a more powerful predictive model.
eXtreme Gradient Boost Regression Extreme Gradient Boosting (XGBoost) is an open source library with an efficient and scalable implementation of the gradient boosting framework. It is an ensemble boosting technique that involves building multiple individual base learners sequentially and combines them to obtain more accurate and powerful predictive model.
Run
Once after the dataset is built we performed one-hot encoding for the categorical features and standardization to bring the features under same scale and center their mean values. Both the models can be executed through the colab links provided in the respective models in the root/Source_Code folder.
All the evaluation graphs and results are available in root/Results/Evaluation Graphs folder
Project Summary
In this work, we've taken few of the simulations parameters i.e., 10 as features for predicitng 4 target variables. And the range of values for the parameter 'NumNodes' are from 10 to 1000. Considering the variablity and size of the dataset available we went with tree based algorithms for building the predictive models and the results obtained were satisfactory inspite of having some imbalances in the dataset.
The work can be extended by incorporating more parameters as features and implementing through complex models like neural networks. The size of dataset can be increased which helps in figuring out more complex patterns in the data.