Competition Date 8/1 09:00
Based on the M.Sc. work of Shmuel Lavian and part of the API Traffic Research Dataset Framework article
The goal of this challenge is to achieve the highest prediction scores while showing advanced data research capabilities in order to map out the right features in the best way fitting your machine learning model of choice.
You will need to predict and classify malicious and benign API traffic utilizing your knowledge in data exploration and machine learning models.
In some datasets you will be required to create a model that labels not only the attack but also the attack type.
API security contains four datasets each will be divided to Train (70%), Test (15%), and Validation (15%) while the train-test split will be done inside the code the validation split is gives without labels for later grading.
All the datasets contain HTTP traffic of API requests and responses.
The datasets are:
- Dataset_1 - The most basic API traffic containing the least number of attacks and endpoints. Will basically enable to have a soft start.
Dataset 1 baseline score:
precision recall f1-score support
Benign 0.95715 0.93922 0.99792 480
Malware 0.99799 0.94129 0.96881 528
accuracy 0.96825 1008
macro avg 0.96860 0.96960 0.96824 1008
weighted avg 0.97000 0.96825 0.96827 1008
- Dataset_2 - A more advance version of the former dataset containing much more attacks and endpoint (almost double). The data is also more complex with a higher randomization mechanism.
Dataset 2 baseline score:
precision recall f1-score support
Benign 0.98455 0.99178 0.98815 20812
Malware 0.88889 0.80851 0.84680 1692
accuracy 0.97800 22504
macro avg 0.93672 0.90015 0.91747 22504
weighted avg 0.97735 0.97800 0.97752 22504
- Dataset_3 - Same as before but this time the dataset includes more complex parameters in any request making it more authentic but also making it much harder to simply "plug and play" on any machine learning model. According to the model you choose some of you may need to approach the data exploration and feature extraction differently.
Dataset 3 baseline score - Label phase:
precision recall f1-score support
Benign 0.95715 0.97607 0.96652 30209
Malware 0.77364 0.65181 0.70752 3791
accuracy 0.93991 34000
macro avg 0.86539 0.81394 0.83702 34000
weighted avg 0.93669 0.93991 0.93764 34000
Dataset 3 baseline score - Attack Type phase:
precision recall f1-score support
Benign 0.95649 0.97705 0.96666 22659
Cookie Injection 1.00000 0.99829 0.99914 584
Directory Traversal 0.17021 0.11511 0.13734 278
LOG4J 0.87582 0.48201 0.62181 278
Log Forging 0.33333 0.15194 0.20874 283
RCE 0.98582 0.98582 0.98582 282
SQL Injection 0.68499 0.58169 0.62913 557
XSS 0.78584 0.74144 0.76300 584
accuracy 0.93966 25505
macro avg 0.72406 0.62917 0.66395 25505
weighted avg 0.93161 0.93966 0.93438 25505
- Dataset_4 - The most advance and complex dataset containing all the above and some more advanced features like API redirection, more requests types, deeper data access and more.
Dataset 4 baseline score - Label phase:
precision recall f1-score support
Benign 0.91077 0.72604 0.80798 5019
Malware 0.79145 0.93596 0.85766 5575
accuracy 0.83651 10594
macro avg 0.85111 0.83100 0.83282 10594
weighted avg 0.84798 0.83651 0.83412 10594
Dataset 4 baseline score - Attack type phase:
precision recall f1-score support
Benign 0.87065 0.97472 0.91975 25316
Cookie Injection 1.00000 0.99780 0.99890 1366
Directory Traversal 0.79130 0.39281 0.52500 695
LOG4J 0.88722 0.17906 0.29798 659
Log Forging 0.54787 0.81066 0.65385 713
RCE 0.00000 0.00000 0.00000 696
SQL Injection 0.70253 0.15868 0.25889 1399
XSS 1.00000 0.49823 0.66509 1415
accuracy 0.86596 32259
macro avg 0.72495 0.50150 0.53993 32259
weighted avg 0.84722 0.86596 0.83634 32259
You will need to prepare four models (one per dataset) from the Train and test sets. The validation dataset does not contain labels. Once you build your model, please run the validation set and upload the output as described in the submission guide below.
The competition datasets include various API attack you may or may not encounter in any dataset. This is a basic description of the attacks you are encouraged to search online about these attacks in order to map out their features:
-
SQL Injection - One of the most common web attacks, uses backend SQL queries to inject code directly to the servers SQL database.
-
Directory Traversal - Uses basic terminal traversal strings in order to reach folders on the servers host that were not meant to be accessed by the user.
-
Remote Code Execution (RCE) - Some may call it the most critical exploit in any system, it allows the attacker to run code remotely on the local machine.
-
Cookie Injection - Cookies injected to a session they were not originated from. This is used sometimes to access another user illegitimately by using his tokens.
-
Cross Site Scripting (XSS) - Another very widely used web vulnerability XSS enables the attacker to run client-side code that will eventually affect sensitive processes handled by the backend.
-
Log4J - One of the most famous recently patched vulnerabilities. An exploit in JAVA servers using the famous Apache logging library that enables to run code remotely (This is actually an "easy" RCE exploit) on the server.
-
Log Forging - A technique of using the system to print fake or fraudulent logs. This enables an attacker to "inject" other user logs or fake his own attack logs in order to make it harder for any security researcher to find out what he really did during his access.
Label phase:
{'Benign': 0, 'Malware': 1}
Attack Type phase:
{'Benign': 0, 'Cookie Injection': 1, 'Directory Traversal': 2, 'LOG4J': 3, 'Log Forging': 4, 'RCE': 5, 'SQL Injection': 6, 'XSS': 7}
Please see the baseline model code that contain:
-
Basic log parsing (you can improve the logic)
-
Basic feature example of feature extraction (you can improve the logic)
-
Basic machine learning model (you can create and type of AI/ML model you want)
-
Output format (this you can’t change the competitions grading code needs this exact format in order to check the result correctly)
Each dataset contains a list of JSON file with this format:
- The log parsing logic reads the JSON and call extract feature function.
- The features are used to train the mode. You can change each part and adapt it to your own logic.
Once you upload the results to the Google Drive folder the code will analyze your results once a day and return the resulting grades including precision and recall. If your score is in the top 10 your name will appear as a leader in the dataset.
The competition score is based on:
-
Cisco judges
-
Competition average top10 ranking
-
Solution presentation and innovation.
Please see the example Jupyter code that will help you start.
As you can see by the baseline scores above, this challenge has 6 different phases which you will be required to submit a result for each one separately:
- Dataset 1 label phase
- Dataset 2 label phase
- Dataset 3 label phase
- Dataset 4 label phase
- Dataset 3 attack_type phase
- Dataset 5 attack_type phase
In order to submit your result file please upload it to this Google Drive folder (You will need to request for permissions)
-
In the submissions' folder create a folder with your team's name.
-
Upload all the result files in your team's folder.
- Please make sure to upload the files in this exact format and name (The baseline code currently saves in this format):
Once a day our grading script will run over the files and update your grading files and leaderboard accordingly.
Reference to cite when you use ATRDF in a research paper:
@misc{Lavian_The_API_Traffic_2023,
author = {Lavian, Shmuel and {Ariel University, Ariel Cyber Innovation Center (ACIC)}},
month = jan,
title = {{The API Traffic Research Dataset Framework (ATRDF)}},
url = {https://github.com/ArielCyber/Cisco_Ariel_Uni_API_security_challenge},
year = {2023}
}