This code allows to generate configurations for the logdata-anomaly-miner (AMiner) based on static log file analysis.
The paper "Semi-supervised Configuration and Optimization of Anomaly Detection Algorithms on Log Data" (link will be added after publication) documents the evaluation of the Configuration-Engine.
The evaluation was done with Apache Access and audit log data from AIT Log Data Set V1.0 and AIT Log Data Set V2.0. For the individual V2.0 datasets the log files were chosen from <dataset_name>/gather/intranet_server/logs/<data_type>.
To install the AMiner-Configuration-Engine simply run:
git clone https://github.com/ait-aecid/aminer-configuration-engine
cd aminer-configuration-engine
git submodule update --init
chmod +x setup.sh
./setup.sh
- Drop relevant files into directory data. The log data has to be of a single type (e.g. audit or Apache Access). The given sample data in directory data is Apache Access data from AIT Log Data Set V2.0 and should be removed before dropping new files.
- Execute command (from within the directory):
python3 create_config.py [-h] [-d DATA_DIR] [-p PARSER_NAME] [-pd USE_PARSED_DATA] [-id DETECTOR_IDS] [-o OPTIMIZE] [-pre PREDEFINED_CONFIG_PATH]
For instance, this command will execute the Configuration-Engine with the Apache Access parser for the detectors with IDs 1, 2 and 4 with the optimization turned on.
python3 create_config.py -d data/ -p ApacheAccessParsingModel -id 1,2,4 -o true
For more information:
python3 create_config.py --help
The meta-configuration file contains the recipes for the detectors' configuration process and the settings for the optimization. The given settings were successfully tested and should be valid for different types of log data. Each detector recipe consists of a composition of one or more configuration methods.
Simply follow the same scheme to define a new meta-configuration for a detector and add it in meta-configuration.yaml under "ParameterSelection":
# define detector
EntropyDetector:
# define how the variables (or paths) are selected/filtered
Variables:
# pre-filter static variables
PreFilter:
Static: {} # "{}" means no additional parameters necessary (because .yaml format)
# select variables (by character pair probability)
Select:
CharacterPairProbability:
# define specific (hyper) parameter for configuration method
mean_crit_thresh: 0.7
# define how specific parameters (of the AMiner config!) are computed
SpecificParams:
# choose the method and define its parameters
CharacterPairProbabilityThresh:
parameter_name: prob_thresh # name of parameter used in detector
min: 0.0
max: 0.9
offset: -0.05