All source files in supported languages generated by a Kaitai Struct compiler have a goal to be human-readable, thus they utilize an extra layer of stream API. This API is followed by Kaitai Struct runtime libraries:
-
[kaitai_struct_cpp_stl_runtime](https://github.com/kaitai-io/kaitai_struct_cpp_stl_runtime) - for C++/STL
-
[kaitai_struct_csharp_runtime](https://github.com/kaitai-io/kaitai_struct_csharp_runtime) - for C#
-
[kaitai_struct_java_runtime](https://github.com/kaitai-io/kaitai_struct_java_runtime) - for Java
-
[kaitai_struct_javascript_runtime](https://github.com/kaitai-io/kaitai_struct_javascript_runtime) - for JavaScript
-
[kaitai_struct_python_runtime](https://github.com/kaitai-io/kaitai_struct_python_runtime) - for Python
-
[kaitai_struct_ruby_runtime](https://github.com/kaitai-io/kaitai_struct_ruby_runtime) - for Ruby
-
[kaitai_struct_swift_runtime](https://github.com/kaitai-io/kaitai_struct_swift_runtime) - for Swift
Obviously, languages differ and thus API has slight differences, but in the nutshell, the general idea is the same. Runtime library provides a class (or collection of operations) KaitaiStream
, which is essentially a wrapper over language’s native standard IO libraries. It features:
-
opening both local file input streams (if applicable) and in-memory input streams for reading in a single API
-
basic stream positioning operations (usually implemented as pass-through to stdlibs' API)
-
operations to read primitive KS types
-
processing operations to aid conversion of byte arrays into their unpacked / decrypted / deobfuscated forms
Names of operations below are given in Kaitai Struct native standard, i.e. lower underscore case. Real-life runtime libraries adapt these names to suit target languages coding style standards, i.e. read_u2be
becomes readU2be
in Java, or ReadU2be
in C#.
KS works always with seekable streams using the following 3 operations:
-
eof
- checks if we’ve reached end-of-stream and returns true if we did -
"reaching end-of-stream" is defined being in a position where requesting of reading any single byte would result in reporting an end-of-stream error, not as in C++
istream
semantics -
seek(n)
- seeks to absolute byte position n in a stream -
pos
- returns current position in a stream in bytes
All reading operations are supposed to "report an error" if they are unable to read requested piece of data. Means of "reporting an error" depend of target language, but generally throwing a typical stdlibs exception (EOFException
or something like that) is preferred. The only exception for this is when a method includes eos_error
parameter and it is set to false
- in this case, the method is excepted to return "best effort" read result.
One can read integers using one of read_$S$L$E
operations, where:
-
$S
is eitheru
if we want to read unsigned integer ors
if we want signed one; -
$L
is length of integer type in bytes. 1, 2, 4 and 8 bytes are supported; -
$E
is [endianness](https://en.wikipedia.org/wiki/Endianness) (order of bytes):l
for little-endian orb
for big-endian;
A few examples:
-
read_u8le
- reads 8-byte (64-bit) unsigned integer, little-endian (AKA Intel, AKA VAX, etc) -
read_s2be
- reads 2-byte (16-bit) signed integer, big-endian (AKA "network byte order", AKA Power, AKA Motorola, etc) -
read_u1
- reads 1-byte unsigned integer - no endianness is given as it’s pointless to do so
Basically, it’s the same designation as used in the type
clause in .ksy
format.
There are 2 ways to read raw binary data as byte arrays:
-
read_bytes(n)
- reads exactly n bytes from a stream; if there are less than n bytes read before hitting end-of-stream, then it reports an error -
read_bytes_full
- reads all remaining bytes from a stream
-
read_str_eos(String encoding)
-
read_str_byte_limit(long len, String encoding)
-
read_strz(String encoding, int term, boolean includeTerm, boolean consumeTerm, boolean eosError)
These methods implement process: …
functionality for attributes, which basically takes a byte array and transforms it into another byte array, performing some operation usually associated with compression / encoding / encryption / obfuscation algorithms. Sometimes extra parameters are passed to these algorithms.
Note that generally these methods do not work with the stream, but get an in-memory buffer to work with, so they should be preferably implemented as static
methods (or class methods, or the closest equivalent).
-
process_xor(data, key)
-
key may be a single byte or a byte array; if the language doesn’t allow 2 methods of the same name with different type signatures, it is preferred to implement 2 methods with distinct names:
process_xor_one
for single byte key andprocess_xor_many
for byte array key -
process_rotate_left(data, amount, group_size)
-
process_zlib(data)