Skip to content

Latest commit

 

History

History
53 lines (42 loc) · 2.74 KB

README.md

File metadata and controls

53 lines (42 loc) · 2.74 KB

windows Pe malware detection using Ensamble learning

Introduction

  • Malware detection is the process of ascertaining the presence of malware on a system or determining whether a program is malicious or harmless so that the system can be protected or recovered from any effects caused by the malicious code .
  • As the number of legitimate users of the Internet increases, so do the opportunities for cybercriminals to gain from manufacturing malware.
  • This is the reason that prompted the authors of the article we investigated to develop a model for predicting whether a PE file is malicious or benign by methods of deep learning and group learning.
  • We implemented the idea in the models and tried to slightly improve the results, which we did manage to do eventually.
  • We used the dataset of the research from Kaggle.
  • The data contains 19611 rows, and 79 columns

Dimensionality Reduction

  • Following the research work - we used PCA to reduce the number of columns to 55, as determined by the researchers.
  • Before that, we carried out our own research and found that in advance we could not refer to 4 columns ('Name', 'Machine’, 'TimeDateStamp', and the target label 'Malware’ that mustn’t be reduced) that represent general or not significant information so that it does not constitute an impact on the data

Malware Detection Using Machine Learning

We used 5 ML models to detect the malware PE files:
- Gaussian Naïve Bayes,
- Decision Tree,
- Random Forest,
- AdaBoost,
- Gradient Boosting

Screenshot 2024-04-11 121605

Deep learning models

- The next stage was implementing 3 DL models: 1. MLP with 1 hidden layer 2. MLP with 2 hidden layers 3. 1D CNN

Malware Detection Using Deep Learning Models and Ensemble Learning

In the last stage they implemented an ensemble learning model by implementing the previous 3 dl models as the first stage, and on top of these results – machine learning models were implemented as the final stage.

The results we reached

Metalearner our their
Decision Tree 0.99981 0.989
Random Forest 0.99981 0.9924
Extra Trees 0.99981 1
KNN 0.98266 0.979
LDA 0.97705 0.98
AdaBoost 0.97673 0.982
SVM 0.97654 0.982
Logistic 0.97642 0.981
SGD 0.97508 0.979
Passive 0.97444 0.978
Gaussian 0.97291 0.972
QDA 0.96577 0.973