This repository is part of IPAL - an Industrial Protocol Abstraction Layer. IPAL aims to establish an abstract representation of industrial network traffic for subsequent unified and protocol-independent industrial intrusion detection. IPAL consists of a transcriber to automatically translate industrial traffic into the IPAL representation, an IDS Framework implementing various industrial intrusion detection systems (IIDSs), and a collection of evaluation datasets. For details about IPAL, please refer to our publications listed down below.
Cyber-physical systems are increasingly threatened by sophisticated attackers, also attacking the physical aspect of systems. Supplementing protective measures, industrial intrusion detection systems promise to detect such attacks. However, due to industrial protocol diversity and lack of standard interfaces, great efforts are required to adapt these technologies to a large number of different protocols. To address this issue, we propose the Industrial Protocol Abstraction Layer (IPAL) - a common representation of industrial communication as input for industrial intrusion detection systems.
This software (ipal-transcriber
) implements the automatic translation of industrial network traffic into IPAL for a variety of industrial protocols. As shown in the overview figure, the transcriber reads live network captures or pcap files and converts them into the IPAL representation.
Implemented Protocols | Status | Supported Message Types |
---|---|---|
CIP | Rudimentary | Code: 76, 77 |
Goose | Moderate | |
IEC 60870-5-104 | Well | U_Format I_Format: 1-21, 30-40, 45-51, 58-64, 70, 100-106 |
IEC 61162-450 | Moderate | UdPbC, no tags |
Modbus TCP | Moderate | Function Codes: 1, 2, 3, 4, 5, 6, 8, 15, 16, 43 |
NMEA0183 | Well | DBT, DPT, GGA, GLL, GNS, GSA, GSV, HDM, HDT, RMC, ROT, RPM, TLL, TTM, VBW, VHW, VLW, VTG, ZDA, RMB, APB, RSA, DTM, Q, AIVDM |
S7 | Rudimentary | Job: 1, 3 Function Code: 4, 5 |
DNP3 | Rudimentary | Function Code: 0-2, 7,8, 13,14, 20, 24, 129, 130 objects (group-id:var): 1:2, 2:1, 20:{0,2}, 50:3, 52:2, 60:{1-4}, 80:1 |
MQTT | Rudimentary | Packet Types: 1-15 |
- Konrad Wolsing, Eric Wagner, and Martin Henze. "Poster: Facilitating Protocol-independent Industrial Intrusion Detection Systems." Proceedings of the 2020 ACM SIGSAC Conference on Computer and Communications Security. 2020 (https://doi.org/10.1145/3372297.3420019)
- Konrad Wolsing, Eric Wagner, Antoine Saillard, and Martin Henze. "IPAL: Breaking up Silos of Protocol-dependent and Domain-specific Industrial Intrusion Detection Systems." Under Review. 2021 (ArXiv: https://arxiv.org/abs/2111.03438)
ipal-trascriber
requires tshark
to be installed. See https://tshark.dev/setup/install/ for installation instructions for your operating system.
Use pip install .
to install system-wide.
Install it locally with misc/install.sh
or manually with:
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
Use docker build -t ipal-transcriber:latest .
to build a Docker image.
ipal-transcriber
can be used on network interfaces (--interface
) or traffic captures (--pcap
). Set --ipal.output
either to a file to write the output to, or use '-' to write to stdout. If the filename ends with '.gz' it is automatically compressed.
An optional rule file (--rule
) allows modifying the output by renaming, removing, or modifying process names and values. For some protocols, e.g. Modbus, request packets need to be cached to properly parse responses. --timeout
defines an upper bound on how long a packet is cached for. Each message can be labeled as malicious or benign, which can be used for the later evaluation of intrusion detection methods. The default value is taken from --malicious.default
. Provide a file with packet-specific or time interval labels with --malicious
.
./ipal-transcriber -h
usage: ipal-transcriber [-h] [--interface INTERFACE] [--pcap FILE] [--protocols STR [STR ...]]
[--rules FILE] [--timeout INT] [--malicious FILE]
[--malicious.default BOOL] [--crc STR] [--ipal.output FILE] [--log STR]
[--logfile FILE] [--compresslevel INT] [--state.output FILE]
[--filter LIST] [--complete-only BOOL] [--state-in-message BOOL]
{default,timeslice} ...
optional arguments:
-h, --help show this help message and exit
--interface INTERFACE
traffic input interface (Use either this or --pcap)
--pcap FILE path to pcap file (Use either this or --interface)
--protocols STR [STR ...]
specify a subset of the available transcribers ['cip', 'dnp3', 'goose',
'iec104', 'iec450', 'modbus', 'nmea0183udp', 's7']. (Default: all)
--rules FILE file containing rules to transform transcribed messages.
--timeout INT number of milliseconds a packet can be responded to. Used for response
matching (Default: 250ms)
--malicious FILE Attack json file for labeling the packets according to the attacks in a
dataset.
--malicious.default BOOL
set this option to 'true' or 'false' to define default malicious
annotation. (Default: None). Can be used in combination with --malicious
--crc STR options for CRC calculations are at 'transport', 'application', combined
with 'or', or 'and'. (Default: and)
--ipal.output FILE output location for ipal messages ('-' stdout, '*.gz' compress).
--log STR define logging level as one of DEBUG, INFO, WARNING, ERROR, or CRITICAL.
(Default: WARNING)
--logfile FILE define file to log to. (Default: stderr)
--compresslevel INT set the gzip compress level. 0 no compress, 1 fast/large, ..., 9
slow/tiny. (Default: 9)
--state.output FILE output location for state information. ('-' stdout, '*.gz' copress)
--filter LIST semicolon separated list of state names to filter for. (Default: no
filter)
--complete-only BOOL output complete states after filterinig only. (Default: True)
--state-in-message BOOL
embed state inside the messages. (Default: False)
State Extractors:
{default,timeslice} These are available state extractor methods. Use -h for further options on
each method.
default Simple last value buffer of all variables
timeslice Outputs complete state in regular time slices.
The transcriber parses each industrial protocol packet and writes one JSON line for each packet to the output. The 'id' is unique for each message. Source ('src') and destination ('dest') are strings with different address levels separated by ":". E.g. IP:Port:Device for Modbus which can address sub-devices within one connection in theory. Activity is one of the following:
- Interrogate: Active request for data
- Inform: Response to requested data or unsolicited message
- Command: Sets new values or command an action
- Action: Responds to command or (unsolicited) performed action
- Confirmation: a packet solely designed as Layer-5 confirmation of prior msg - not restricted to Commands or Interrogations
The field 'data' contains a dictionary of all transmitted industrial process value names and values. Values are set to 'null', e.g., if this value is requested.
{
"id": 0,
"timestamp": 1445465436.995232,
"protocol": "s7",
"malicious": null,
"src": "10.10.10.20:49156",
"dest": "10.10.10.10:102",
"length": 82,
"crc": true,
"type": 1,
"activity": "interrogate",
"responds to": [],
"data": {
"16": null,
"17": null,
"18": null,
"19": null,
"20": null
}
}
Rules should be specified in a file passed through --rules
to the transcriber. Rules allow renaming, removing, or modifying process names and values. More specifically, they allow the addition and removal of select data fields from IPAL messages matching certain patterns, and the renaming of message source and destination fields.
A rules file is a python module containing a variable named JS
pointing to a dictionary discribing the desired post-processing step.
A rules file may optionally also declare methods containing the post-processing logic. An example rules file can be found under misc/rules/nmea.py
an extract of which is given below:
def position_sign(vars):
if vars[1] in ["N", "E"]:
return +vars[0]
elif vars[1] in ["S", "W"]:
return -vars[0]
JS = {
"protocols": ["nmea0183udp", "iec450"],
"rules": [
{ # Position North-South
"type": "RMC",
"var": ["RMC2", "RMC3"],
"method": position_sign,
"name": "latitude",
"remove": True,
},
],
"rename": {
".*:GG": "GNSS",
},
}
JS
may contain three key-value pairs:
protocols
: a list of the protocols the packets of which transformation rules and rename operations should be applied to (required)rules
: a list of dictionaries describing transformation rules (optional). Each dictionary may contain the following keys:var
: list containing the data fields to apply the rule to (required)type
,src
,dst
: regular expression matching the type, source and destiation field respectively of messages the rule should apply to. (All optional, default to matching any value when omitted).method
: method that should be applied to the fields specified invar
, requiresname
to also be present (optional)name
: name of the new data field which will contain the result ofmethod
, requiresmethod
to be present (optional)remove
: whether to remove the fields specified invar
(optional, defaults to false if omitted)
rename
: a dictionary of key-value pairs describing renaming rules (optional). Each dictionary key-value pair should be of the form:key
: regular expression matching the message source or destinationvalue
: string that should be used as the new value for matches
In the example above, the specified rule applies to messages of type
RMC
: the data fields RMC2
and RMC3
are removed from messages, and a new field latitude
containing the return value of position_sign([x,y])
where x
and y
are the value of fields RMC2
and RMC3
is added to the messages.
The rename
key-value pair ".*:GG": "GNSS"
specifies that all destination and source fields containing the :GG
should be updated to GNSS
instead.
ipal-state-extractor
transforms the packet-wise message format into the state format used by many process-based IDSs. It can be used as standalone programs or directly within the transcriber tool passing the same arguments. The state is written to the file provided by --state.output
or to stdout. There are different methods to derive the state from packets. Each state extractor has its options which can be retrieved by ./state_extractor.py [extractor method] -h
.
./ipal-state-extractor -h
usage: ipal-state-extractor [-h] [--ipal.input FILE] [--state.output FILE] [--filter LIST]
[--complete-only BOOL] [--state-in-message BOOL] [--compresslevel INT]
[--log STR] [--logfile FILE]
{default,timeslice} ...
optional arguments:
-h, --help show this help message and exit
--ipal.input FILE input location for message information. ('-' stdin, '*gz' compressed)
--state.output FILE output location for state information. ('-' stdout, '*.gz' copress)
--filter LIST semicolon separated list of state names to filter for. (Default: no
filter)
--complete-only BOOL output complete states after filterinig only. (Default: True)
--state-in-message BOOL
embed state inside the messages. (Default: False)
--compresslevel INT set the gzip compress level. 0 no compress, 1 fast/large, ..., 9
slow/tiny. (Default: 9)
--log STR define logging level as one of DEBUG, INFO, WARNING, ERROR, or CRITICAL.
(Default: WARNING)
--logfile FILE define file to log to. (Default: stderr)
State Extractors:
{default,timeslice} These are available state extractor methods. Use -h for further options on
each method.
default Simple last value buffer of all variables
timeslice Outputs complete state in regular time slices.
Currently, the following state extraction methods are implemented:
State Extractor | Description |
---|---|
default | Output one state for each message and keep the value of each variable. |
timeslice | Keep the last value of each variable and output a state in regular intervals, e.g. every second. |
The state format represents the entire state, the values of all sensors and actuators of a physical process, for a given time. 'state' contains all observed variables and values. Each variable's name is a ':' separated list of its device and variable name. A state is labeled as 'malicious' if at least a single packet is malicious since the output of the last state.
{
"timestamp": 1445465437.00792,
"state": {
"10.10.10.10:102:16": 0,
"10.10.10.10:102:17": 0,
"10.10.10.10:102:18": 0,
"10.10.10.10:102:19": 0,
"10.10.10.10:102:20": 0
},
"malicious": null,
}
The ipal-minimize
tool clears the process information (data
and state
) from IPAL messages or state files. This may be used to safe disk space in case the actual process data is not required.
./ipal-minimize -h
usage: ipal-minimize [-h] [--jobs INT] [--log STR] [--logfile FILE] FILE [FILE ...]
positional arguments:
FILE files to minimize ('*.gz' compressed).
optional arguments:
-h, --help show this help message and exit
--jobs INT Number of parallel workers (Default: 4).
--log STR define logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL). Default is
WARNING.
--logfile FILE File to log to. Default is stderr.
The ipal-combine
tool can be used to merge different IPAL dataset files from different IIDSs. Currently, the IDS outputs are ORed.
ipal-combine -h
usage: ipal-combine [-h] --dataset FILE --output FILE [--log STR] [--logfile FILE] FILE [FILE ...]
positional arguments:
FILE files to combine ('*.gz' compressed).
optional arguments:
-h, --help show this help message and exit
--dataset FILE original dataset ('*.gz' compressed).
--output FILE path to store combined output to ('*.gz' compressed).
--log STR define logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL). Default is WARNING.
--logfile FILE File to log to. Default is stderr.
We use different tools for development, code formatting, style checking, and testing. You can install all tools with the following command:
pip3 install -r requirements-dev.txt
All tools can be executed manually with the following commands and report errors if encountered:
black .
flake8
python3 -m pytest
You can also enforce black and flake8 to check the code before any commit with Git's pre-commit.
pre-commit install
More information on the black and flake8 setup can be found at https://ljvmiranda921.github.io/notebook/2018/06/21/precommits-using-black-and-flake8/
The process for adding support for a new protocol is the following:
- Add a new module in
transcribers/
- Create a new parser class inheriting the Transcriber class (see
transcribers/transcriber.py
). The parser class may implement:matches_protocol
: given a packet, returnTrue
if the parser can handle it,False
otherwise (required)parse_packet
: given a packet the parser can handle, return a list of valid IPAL messages (required)state_identifier
: given a packet and the name of a data field, return a string identifying the corresponding field (optional, used by the state extractor)matches_response
: given a list of request messages and a response message, modify the response message'sresponds_to
field, by adding matching requestid
s, may return a list of request messages to remove from the request queue (optional)
- Add the new transcriber to the list in
transcribers/utils.py
- Add the new protocol to the implemented protocols table above
- Add test cases covering the added protocol
The process for adding a new state extraction method is the following:
- Add a new file in
state_extractors/
- Create a new state extractor class inheriting the StateExtractor class (see
state_extractors/state_extractor.py
). The state extractor class may implement:update_state
: given an IPAL message, update the current process state and if desired call_write_state
to write the state to output (required)finalize
: called by the main state extractor script after all messages have been processed, implementingfinalize
allows to execute logic on completion, e.g. outputing one final state (required)add_arguments_to_parser
: add arguments to the main state extractor script, theargs
namespace is passed to the class on initialization, allowing reading in additional user configuration and flags (optional)
- Add the new state extractor to the list in
state_extractors/utils.py
- Add the new state extractor to the implemented state extractors table above
The process for adding tests depends on the type of tests to add.
When adding support for a new protocol, add a check against validation output for the raw transcriber output, the state extractor output, and the combined transcriber and state extractor output. These checks should be added in the modules tests/test_transcriber.py
, tests/test_state_extractor.py
and tests/test_combined.py
respectively, by adding an entry to the RAW_FILES
list.
As an example, adding the three-tuple ("misc/pcaps/s7.pcap", "s7.ipal", "s7")
to RAW_FILES
in tests/test_transcriber.py
signifies that a check of the output of the transcriber executed on the packet capture file misc/pcaps/s7.pcap
, with the protocol flag set to s7
will be conducted against the reference file in tests/snapshots/validation/test_transcriber_raw_s7.ipal
.
Note that upon adding a new test, a validation IPAL
file will be created under tests/snapshots/validation/
after the first test run. Simply edit such newly created validation files to remove the == new file ==
marker on the first line and the files will be used as a validation reference for future tests.
To ensure that a transcriber's protocol implementation stays compliant, a transcriber test module can be added in tests/transcribers/
. It may contain methods testing individual features and properties of the added transcriber. Note that for it to be collected by pytest, the module and test methods must be prefixed by test_
and test methods contain an assert
. See the pytest docs for more information.
MIT License. See LICENSE for details.