Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoding version 2: binary-only memory layout #1317

Merged
merged 2 commits into from
Jun 25, 2024
Merged

Conversation

willdealtry
Copy link
Collaborator

@willdealtry willdealtry commented Feb 9, 2024

This PR adds an additional data encoding that is entirely binary in terms of the essential data structure descriptors. The aim is to make encoding and decoding faster, have a storage structure that can be entirely described by a set of POD structures (located in memory_layout.hpp), and pave the way for a more sophisticated approach to data encoding that will follow in a separate (smaller) PR.

The main work in this PR is to remove protobuf structures entirely from the internal implementation of data encoding and compression, and provide a mapping layer which can translate from standard C++ structures to either the legacy (protobuf) format or the new binary one.

The binary encoding is a much more direct representation of the in-memory structures, which fall into three main groups. The structures around EncodedFieldCollection describe the layout of a (usually compressed) field in storage, although a fixed set of optional EncodedFields are used at the head of the segment to bootstrap the decoding. StreamDescriptor describes features such as the names, types and dimensionality of the data being represented (thus it is oriented primarily towards the features of data in memory, as opposed to data in storage. Finally the TimeseriesDescriptor represents elements specific to segments that describe and reference sets of other segments, such as indexes etc, i.e. it contains data relevant to time series and dataframes as a whole rather than to their component parts.

@willdealtry willdealtry force-pushed the hash_descriptor_v2 branch 3 times, most recently from 398eed5 to 2789314 Compare April 24, 2024 15:06
@willdealtry willdealtry force-pushed the hash_descriptor_v2 branch from 2789314 to 388f323 Compare May 7, 2024 14:11
@willdealtry willdealtry force-pushed the hash_descriptor_v2 branch 4 times, most recently from 9d9ec12 to 38c65c7 Compare May 22, 2024 11:18
@willdealtry willdealtry changed the title WIP descriptor changes Encoding version 2: binary-only memory layout May 30, 2024
@willdealtry willdealtry marked this pull request as ready for review May 30, 2024 12:17
@willdealtry willdealtry force-pushed the hash_descriptor_v2 branch 3 times, most recently from 95601e1 to 26c23f9 Compare June 5, 2024 13:21
@willdealtry willdealtry requested a review from IvoDD June 11, 2024 10:49
Copy link
Collaborator

@IvoDD IvoDD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've paied closest attention to the protobuf mappings and new memory layout to check it all looks compatible. It all makes sense and most of my comments are just questions to double check everything makes sense.

cpp/arcticdb/memory_layout.hpp Outdated Show resolved Hide resolved
python/tests/util/mark.py Outdated Show resolved Hide resolved
cpp/arcticdb/memory_layout.hpp Outdated Show resolved Hide resolved
cpp/arcticdb/codec/encoded_field.hpp Show resolved Hide resolved
cpp/arcticdb/memory_layout.hpp Outdated Show resolved Hide resolved
cpp/arcticdb/memory_layout.hpp Outdated Show resolved Hide resolved
cpp/arcticdb/codec/protobuf_mappings.cpp Outdated Show resolved Hide resolved
cpp/arcticdb/codec/protobuf_mappings.cpp Show resolved Hide resolved
cpp/arcticdb/codec/protobuf_mappings.hpp Show resolved Hide resolved
cpp/arcticdb/entity/timeseries_descriptor.hpp Show resolved Hide resolved
cpp/arcticdb/CMakeLists.txt Outdated Show resolved Hide resolved
cpp/arcticdb/stream/stream_utils.hpp Outdated Show resolved Hide resolved
cpp/arcticdb/util/buffer.hpp Show resolved Hide resolved
cpp/arcticdb/util/timer.hpp Show resolved Hide resolved
cpp/arcticdb/version/python_bindings.cpp Show resolved Hide resolved
cpp/arcticdb/entity/protobuf_mappings.cpp Show resolved Hide resolved
cpp/arcticdb/entity/protobuf_mappings.cpp Show resolved Hide resolved
cpp/arcticdb/entity/protobuf_mappings.cpp Outdated Show resolved Hide resolved
cpp/arcticdb/storage/memory_layout.hpp Outdated Show resolved Hide resolved
cpp/arcticdb/storage/memory_layout.hpp Show resolved Hide resolved
@willdealtry willdealtry force-pushed the hash_descriptor_v2 branch 4 times, most recently from dead6f5 to 1f6ae95 Compare June 23, 2024 21:54
@willdealtry willdealtry merged commit c81f8ad into master Jun 25, 2024
114 checks passed
@willdealtry willdealtry deleted the hash_descriptor_v2 branch June 25, 2024 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants