This repository provides the implementation of Logbert for log anomaly detection. The process includes downloading raw data online, parsing logs into structured data, creating log sequences and finally modeling.
- Ubuntu 20.04
- NVIDIA driver 460.73.01
- CUDA 11.2
- Python 3.8
- PyTorch 1.9.0
This code requires the packages listed in requirements.txt. An virtual environment is recommended to run this code
On macOS and Linux:
python3 -m pip install --user virtualenv
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt
deactivate
Reference: https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/
Logbert and other baseline models are implemented on HDFS, BGL, and thunderbird datasets
cd HDFS
sh init.sh
# process data
python data_process.py
#run logbert
python logbert.py vocab
python logbert.py train
python logbert.py predict
#run deeplog
python deeplog.py vocab
# set options["vocab_size"] = <vocab output> above
python deeplog.py train
python deeplog.py predict
#run loganomaly
python loganomaly.py vocab
# set options["vocab_size"] = <vocab output> above
python loganomaly.py train
python loganomaly.py predict
#run baselines
baselines.ipynb
~/.dataset //Stores original datasets after downloading
project/output //Stores intermediate files and final results during execution