Skip to content

Industrial Intrusion Detection - A framework for protocol-independent industrial intrusion detection on top of IPAL.

License

Notifications You must be signed in to change notification settings

FBasels/ipal_ids_framework

 
 

Repository files navigation

IPAL - Industrial Intrusion Detection Framework

This repository is part of IPAL - an Industrial Protocol Abstraction Layer. IPAL aims to establish an abstract representation of industrial network traffic for subsequent unified and protocol-independent industrial intrusion detection. IPAL consists of a transcriber to automatically translate industrial traffic into the IPAL representation, an IDS Framework implementing various industrial intrusion detection systems (IIDSs), and a collection of evaluation datasets. For details about IPAL, please refer to our publications listed down below.

The ever-increasing digitization in industries enables the automatization of complex physical processes and, with progressive integration into the Internet, also large-scale distributed systems. Due to both trends, well-known cyber-security problems are inherited, which, in the past, already led to severe attacks, e.g., the striking of the Ukrainian power grid in 2015. Supplementing proactive measures, Industrial Intrusion Detection Systems (IIDSs) promise to detect such attacks timely by monitoring the communication between automatization devices or accessing the processes’ physical state. Researchers proposed many IIDS solutions until today. However, due to a lack of standard interfaces and diverse communication protocols across industrial domains, great efforts are required to adapt existing IIDSs to new domains and communication protocols. To overcome this issue, we propose IPAL - a common message format that decouples IIDSs from domain-specific communication protocols. This representation applies to most IIDSs, as all their input data requirements are covered. Moreover, the required data is extractable across multiple industrial protocols due to inherent similarities in their communication patterns.

This repository contains the ipal-iids framework together with implementations of several IIDSs based on the IPAL message and state format generated by our second project the ipal-transcriber. As shown in the overview figure below, the IIDS framework consists of two phases. In the training phase, the IIDSs learn an internal model based on a training dataset and a configuration file with IIDS specific parameters. During the live phase, the IIDSs load the trained models and search for anomalies in live data.

Overview Figure

Overview Figure

Implemented IIDSs

The IIDS framework contains implementations of the following IIDSs. Note that we distinguish between IIDSs operating on the IPAL message format (on a per-network packet basis) or on the IPAL state format (a summary of all industrial process values for a given point in time).

IDSs Type Publication/Source Code Description
Autoregression State Paper, Paper, Code Process prediction (not reproduced)
BLSTM Message/State Paper, Code Machine Learning - Bidirectional Long Short Term Memory
Decision Trees Message/State Paper Code (not reproduced)
Dummy Message/State -- Implements a Dummy IDS that always or never alerts.
DTMC* Message Paper, Code Packet Sequences - Discrete-time Markov Chains
Extra Trees Message/State Paper Code (not reproduced)
Inter-arrival time Message Paper Packet Inter-arrival time
Isolation Forest Message/State Paper Code (not reproduced)
SIMPLE-Histogram Message/State -- Histogramm of a sensor over time.
SIMPLE-MinMax Message/State -- Minimum and Maximum of a value plus threshold
SIMPLE-Steadytime Message/State -- Compares longest or shortest time in a single state of a sensor.
Naive Bayes Message/State Paper (not reproduced)
Optimal Message/State -- Implements a "Oracle" that always classifies correctly (or always incorrect if desired).
PASAD* State Paper, Code, Code Process prediction - Process-Aware Stealthy Attack Detector
Random Forest Message/State Paper, Code Machine Learning - Random Forest
Seq2SeqNN* State Paper, Code Process Prediction - Sequence-to-Sequence Neural Networks
Support Vector Machine Message/State Paper, Code Machine Learning - Support Vector Machine
TABOR* State Paper Process Sequences - Time Automata and Bayesian netwORk
Note: IDSs marked with * are not available publically, but can be obtained on request.
Publications
  • Konrad Wolsing, Eric Wagner, Antoine Saillard, and Martin Henze. 2022. IPAL: Breaking up Silos of Protocol-dependent and Domain-specific Industrial Intrusion Detection Systems. In 25th International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2022), October 26–28, 2022, Limassol, Cyprus. ACM, New York, NY, USA, 17 pages. https://doi.org/10.1145/3545948.3545968
  • Wolsing, Konrad, Eric Wagner, and Martin Henze. "Poster: Facilitating Protocol-independent Industrial Intrusion Detection Systems." Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security. 2020 https://doi.org/10.1145/3372297.3420019

Getting started

Prerequisites
  • ipal-iids requires libgsl (or libgsl-dev) to be installed. See https://www.gnu.org/software/gsl/doc/html/index.html for further information.
  • The Autoregression IIDS requires ar. Please make sure that python-dev or the corresponding version (e.g. python3.9-dev) is installed on your system
Installation (pip)

Use pip install . to install system-wide.

Installation (venv)

Install it locally with misc/install.sh or manually with:

python3 -m venv venv
source venv/bin/activate

pip3 install numpy
pip3 install -r requirements.txt
Installation (docker)

Use docker build -t ipal-ids-framework:latest . to build a Docker image.

Usage

Usage IIDS Framework

The ipal-iids consists of two phases. During training, the parameters --train.ipal or --train.state have to be provided together with a configuration file via --config. Afterwards, the live detection phase starts. Therefore, the parameters --live.ipal or --live.state have to be provided and --output defines the location where the annotated IIDS output is written to.

Each IIDS has its own options which can be retrieved by ipal-iids --default.config [ids-name].

ipal-iids -h
usage: ipal-iids [-h] [--train.ipal FILE] [--train.state FILE] [--live.ipal FILE]
                  [--live.state FILE] [--output FILE] [--config FILE] [--default.config IDS]
                  [--retrain] [--log STR] [--logfile FILE] [--compresslevel INT]

optional arguments:
  -h, --help            show this help message and exit
  --train.ipal FILE     input file of IPAL messages to train the IDS on ('-' stdin, '*.gz'
                        compressed).
  --train.state FILE    input file of IPAL state messages to train the IDS on ('-' stdin,
                        '*.gz' compressed).
  --live.ipal FILE      input file of IPAL messages to perform the live detection on ('-'
                        stdin, '*.gz' compressed).
  --live.state FILE     input file of IPAL state messages to perform the live detection on 
                        ('-' stdin, '*.gz' compressed).
  --output FILE         output file to write the anotated IDS output to (Default:none, '-'
                        stdout, '*,gz' compress).
  --config FILE         load IDS configuration and parameters from the specified file
                        ('*.gz' compressed).
  --default.config IDS  dump the default configuration for the specified IDS to stdout and
                        exit, can be used as a basis for writing IDS config files. Available
                        IIDSs are: BLSTM,inter-arrival-mean,inter-arrival-
                        range,RandomForest,SVM
  --retrain             retrain regardless of a trained model file being present.
  --log STR             define logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL)
                        (Default: WARNING).
  --logfile FILE        file to log to (Default: stderr).
  --compresslevel INT   set the gzip compress level. 0 no compress, 1 fast/large, ..., 9
                        slow/tiny. (Default: 9)

Usage Preprocessor

The preprocessors are useful for IIDSs, that require a certain input format. E.g., some machine-learning IIDSs work best if their data is scaled between 0 and 1. Only IIDSs inheriting from the FeatureIDS can use the preprocessors. Initially, the preprocessors are fitted to the training data. Currently, the following preprocessors are implemented:

Preprocessor Description
aggregate Aggregates multiple feature vectors into a single vector
categorical Encode, usually strings, as an array of binary indicators
gradient Calculates the derivative of a process value
indicate-none Extend each feature with a binary value indicating whether the feature is none or not
label Encode, usually strings, as numeric labels
mean Subtract mean and scale by the standard deviation
minmax Scale by minimum and maximum from 0 to 1
pca Performs a principal component analysis on the input vector

Multiple preprocessors can be used in series. The following example shows how preprocessors are defined in the configuration file:

{
    "SVM Preprocessor Example" : {
        "_type": "SVM",
        ...
        "features" : ["src", "type", "state;4:PID Setpoint", "length"],
        "preprocessors": [
            {"method" : "mean", "features" : ["state;4:PID Setpoint", "length"]},
            {"method" : "categorical", "features" : ["type"]}
        ],
        ...
    }
}

Usage configuration files

The configuration file determines the parameters for each IIDS. A default configuration for each IIDS can be obtained with ipal-iids --default.config [IIDS name]:

ipal-iids --default.config inter-arrival-mean
{
    "inter-arrival-mean": {
        "_type": "inter-arrival-mean",
        "model-file": "./model",
        "N": 4,
        "W": 5
    }
}

The IIDS framework allows for using multiple IIDSs in parallel. Each entry in the configuration file can have a different name, e.g., one IIDS for each sensor of a physical system. Currently, the output of multiple IIDSs is combined with 'or' - meaning an alert is emitted if at least one IIDS detected an anomaly.

Usage ipal-visualize-model

This tool allows for visualizing the trained models for an IIDS configuration. To plot a specific model use ipal-visualize-model [path-to-config-file].

ipal-visualize-model  -h
usage: ipal-visualize-model [-h] [--log STR] [--logfile FILE] FILE

positional arguments:
  FILE            load the IDS configuration of the trained model ('*.gz' compressed).

optional arguments:
  -h, --help      show this help message and exit
  --log STR       define logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL).
                  Default is WARNING.
  --logfile FILE  File to log to. Default is stderr.

Usage ipal-extend-alarms

The ipal-extend-alarms works as an online tool - meaning IIDSs have to decide whether they emit an alert live. Therefore, alerts can not be emitted retroactively, wich is sometimes needed for evaluation. As few IIDSs possibly need to retroactively emit alerts, this script post-processes the IIDS output afterward. IIDSs with the support for ipal-extend-alarms need the parameter adjust: true to be set in their configuration files.

Development

Tooling

We use different tools for development, code formatting, style checking, and testing. You can install all tools with the following command:

pip3 install -r requirements-dev.txt

All tools can be executed manually with the following commands and report errors if encountered:

black .
flake8
python3 -m pytest

You can also enforce black and flake8 to check the code before any commit with Git's pre-commit.

pre-commit install

More information on the black and flake8 setup can be found at https://ljvmiranda921.github.io/notebook/2018/06/21/precommits-using-black-and-flake8/

Add an IIDS

The process for adding support for a new IIDS is the following:

  1. Add a new folder and IIDS module in ids/[ids name]/[ids name].py
  2. Create a new IIDS class inheriting the MetaIDS class (see ids/ids.py) or inheriting the FeatureIDS class (see ipial_iids/ids/featureids.py) for preprocessor support. The IIDS class may implement:
    • train: given some training data, the IIDS should learn its internal model
    • new_ipal_msg: given a new IPAL message, return whether the IIDS detected an anomaly
    • new_state_msg: given a new IPAL state message, return whether the IIDS detected an anomaly
    • save_trained_model: save the trained model to disc
    • load_trained_model: load a trained model from disc
    • visualize_model: create a Matplotlib visualization of the model for debugging purposes
  3. Add the new IIDS to the list in ids/utils.py
  4. Add the new IIDS to the list in tests/conftest.py
  5. Add the new IIDS to the implemented IIDSs table above
Add a preprocessor

The process for adding a new state extraction method is the following:

  1. Add a new preprocessor module in preprocessors/
  2. Create a new preprocessor class inheriting the Preprocessor class (see preprocessors/preprocessor.py). The preprocessor class may implement:
    • fit: given a set of training data, train the preprocessor on it
    • transform: preprocess a given data sample based on the fitted model
    • reset: reset the preprocessor between individual dataset
    • get_fitted_model: return a representation of the fitted mode, which can be saved to disc
    • from_fitted_model: return an initialized preprocessor based on a previously saved model
  3. Add the new preprocessor to the list in preprocessors/utils.py
  4. Add the new preprocessor to the preprocessor list table above

Contributors

  • Antoine Saillard (RWTH Aachen University & Fraunhofer FKIE)
  • Eric Wagner (Fraunhofer FKIE & RWTH Aachen University)
  • Konrad Wolsing (Fraunhofer FKIE & RWTH Aachen University)
  • Lea Thiemt (RWTH Aachen University)
  • Sven Zemanek (Fraunhofer FKIE)
  • Dominik Kus (RWTH Aachen University)

License

MIT License. See LICENSE for details.

About

Industrial Intrusion Detection - A framework for protocol-independent industrial intrusion detection on top of IPAL.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.3%
  • Other 0.7%