ITC Data Augmentation

Goal

Internet Traffic Classification (ITC) is a vital research area in the era of increasing online services. However, the evolution of internet protocols and encryption methods poses challenges for the classification of encrypted Internet traffic. One of the main challenges is the lack of open-source datasets and the shortcomings of existing ones. This thesis tackles this challenge by proposing 3 data augmentation techniques: LSTM, Average, and MTU, which have different advantages and drawbacks.

Repository

This repository consist of 4 directories:

code - contains the scripts for the project
data - provides sample data from open source
data converter - includes code to transform the data to the required format
images - stores the images used in the README file

Code

How to Run the Code

To run the code provided in this project, you need to run the Main.py file with the following arguments:

Required arguments:

data_dir: The directory containing the training data.
augmentation: The augmentation method to use. Supported values are lstm, average, and mtu.

Optional arguments:

--test_split: The fraction of the data to use for the test set. Default is 0.2.
--val_split: The fraction of the data to use for the validation set. Default is 0.2.
--batch_size: The batch size to use for training. Default is 32.
--split: The split of the flow. Default is 16.
--max_len: The maximum length of a flow. Default is 32.
--data_max_size: The maximum number of data points to use. Default is -1 (use all data points).
--avg_n: The number of data points to average for the Average augmentation. Default is 2.
--th_min: The minimum threshold for the MTU augmentation. Default is 750.
--th_max: The maximum threshold for the MTU augmentation. Default is 1200.

Example Usage

To train a model using the LSTM augmentation on the provided data, run the following command:

python Main.py ../data/  lstm

This will train a model using the default batch size (32) and data_max_size (-1). To use different values, specify them as command line arguments. For example, to train a model using a batch size of 64 and data_max_size of 100, run the following command:

python Main.py ../data/  lstm --batch_size 64 --data_max_size 100

How to Run the Tests

To run the tests provided in this project, you need to run the Tests.py file with the following arguments:

Required arguments:

data_dir: The directory containing the test data.
augmentation: The augmentation method to test. Supported values are lstm, average, and mtu.

Optional arguments:

--split: The split of the flow. Default is 16.
--max_len: The maximum length of a flow. Default is 32.
--avg_n: The number of data points to average for the Average augmentation. Default is 2.

Example Usage

To test the LSTM augmentation generated model on the provided data, run the following command:

python Tests.py ../data/  lstm

Data

In the paper we tested our data augmentation on three different datasets.

QUIC Paris:

Extracted from: Data
The data was generated by: Tong, V., Tran, H. A., Souihi, S., & Mellouk, A. (2018, December). A novel QUIC traffic classifier based on convolutional neural networks. In 2018 IEEE Global Communications Conference (GLOBECOM) (pp. 1-6). IEEE. Article

QUIC Davis:

Extracted from: Data
The data was generated by: Rezaei, S., & Liu, X. (2018). How to achieve high classification accuracy with just a few labels: A semi-supervised approach using sampled packets. arXiv preprint arXiv:1812.09761. Article

Flash:

consists of real-world data captured in 2023 and is a commercial dataset
To request this dataset, please email me at [email protected].

Results of our augmentation methods on presented datasets

Average augmentation with m = 2.

Model trained on original train set, test set: original VS modified MTU

Model trained on original + MUT augmanted train set, test set: original VS modified MTU

Data converter

This code is used to convert data to fit the code required format. The code was originally implemented by FlowPic, and has been modified to fit our specific needs.

For more details on the code, please visit the provided link.

Dependencies

keras>=2.13.1
numpy>=1.24.3
scikit_learn>=1.3.0
tensorflow>=2.13.0

To install the required modules using pip:

pip install -r requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ITC Data Augmentation

Table of Contents

Goal

Repository

Code

How to Run the Code

Example Usage

How to Run the Tests

Example Usage

Data

QUIC Paris:

QUIC Davis:

Flash:

Results of our augmentation methods on presented datasets

Average augmentation with m = 2.

Model trained on original train set, test set: original VS modified MTU

Model trained on original + MUT augmanted train set, test set: original VS modified MTU

Data converter

Dependencies

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
code		code
data converter		data converter
data		data
images		images
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

ArielCyber/ITC-Data-Augmentations-

Folders and files

Latest commit

History

Repository files navigation

ITC Data Augmentation

Table of Contents

Goal

Repository

Code

How to Run the Code

Example Usage

How to Run the Tests

Example Usage

Data

QUIC Paris:

QUIC Davis:

Flash:

Results of our augmentation methods on presented datasets

Average augmentation with m = 2.

Model trained on original train set, test set: original VS modified MTU

Model trained on original + MUT augmanted train set, test set: original VS modified MTU

Data converter

Dependencies

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages