Skip to content

ArielCyber/ITC-Data-Augmentations-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ariel University Home ACIC Home

ITC Data Augmentation

Table of Contents

Goal

Internet Traffic Classification (ITC) is a vital research area in the era of increasing online services. However, the evolution of internet protocols and encryption methods poses challenges for the classification of encrypted Internet traffic. One of the main challenges is the lack of open-source datasets and the shortcomings of existing ones. This thesis tackles this challenge by proposing 3 data augmentation techniques: LSTM, Average, and MTU, which have different advantages and drawbacks.

Repository

This repository consist of 4 directories:

  • code - contains the scripts for the project
  • data - provides sample data from open source
  • data converter - includes code to transform the data to the required format
  • images - stores the images used in the README file

Code

How to Run the Code

To run the code provided in this project, you need to run the Main.py file with the following arguments:

Required arguments:

  • data_dir: The directory containing the training data.
  • augmentation: The augmentation method to use. Supported values are lstm, average, and mtu.

Optional arguments:

  • --test_split: The fraction of the data to use for the test set. Default is 0.2.
  • --val_split: The fraction of the data to use for the validation set. Default is 0.2.
  • --batch_size: The batch size to use for training. Default is 32.
  • --split: The split of the flow. Default is 16.
  • --max_len: The maximum length of a flow. Default is 32.
  • --data_max_size: The maximum number of data points to use. Default is -1 (use all data points).
  • --avg_n: The number of data points to average for the Average augmentation. Default is 2.
  • --th_min: The minimum threshold for the MTU augmentation. Default is 750.
  • --th_max: The maximum threshold for the MTU augmentation. Default is 1200.

Example Usage

To train a model using the LSTM augmentation on the provided data, run the following command:

python Main.py ../data/  lstm

This will train a model using the default batch size (32) and data_max_size (-1). To use different values, specify them as command line arguments. For example, to train a model using a batch size of 64 and data_max_size of 100, run the following command:

python Main.py ../data/  lstm --batch_size 64 --data_max_size 100

How to Run the Tests

To run the tests provided in this project, you need to run the Tests.py file with the following arguments:

Required arguments:

  • data_dir: The directory containing the test data.
  • augmentation: The augmentation method to test. Supported values are lstm, average, and mtu.

Optional arguments:

  • --split: The split of the flow. Default is 16.
  • --max_len: The maximum length of a flow. Default is 32.
  • --avg_n: The number of data points to average for the Average augmentation. Default is 2.

Example Usage

To test the LSTM augmentation generated model on the provided data, run the following command:

python Tests.py ../data/  lstm

Data

In the paper we tested our data augmentation on three different datasets.

QUIC Paris:

  • Extracted from: Data

  • The data was generated by: Tong, V., Tran, H. A., Souihi, S., & Mellouk, A. (2018, December). A novel QUIC traffic classifier based on convolutional neural networks. In 2018 IEEE Global Communications Conference (GLOBECOM) (pp. 1-6). IEEE. Article

QUIC Davis:

  • Extracted from: Data
  • The data was generated by: Rezaei, S., & Liu, X. (2018). How to achieve high classification accuracy with just a few labels: A semi-supervised approach using sampled packets. arXiv preprint arXiv:1812.09761. Article

Flash:

  • consists of real-world data captured in 2023 and is a commercial dataset
  • To request this dataset, please email me at [email protected].

Results of our augmentation methods on presented datasets

Average augmentation with m = 2.

avg_2_compare

Model trained on original train set, test set: original VS modified MTU

mtu_in_test

Model trained on original + MUT augmanted train set, test set: original VS modified MTU

mtu_in_train_and_test

Data converter

This code is used to convert data to fit the code required format. The code was originally implemented by FlowPic, and has been modified to fit our specific needs.

For more details on the code, please visit the provided link.

Dependencies

  • keras>=2.13.1
  • numpy>=1.24.3
  • scikit_learn>=1.3.0
  • tensorflow>=2.13.0

To install the required modules using pip:

pip install -r requirements.txt

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages