-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More changes for segment ownership, get rid of mutable segment() #1757
Open
poodlewars
wants to merge
1
commit into
master
Choose a base branch
from
aseaton/fix-segment-ownership-minimal-more
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
poodlewars
force-pushed
the
aseaton/fix-segment-ownership-minimal-more
branch
from
August 8, 2024 14:43
15f6521
to
ff4a391
Compare
poodlewars
changed the base branch from
master
to
aseaton/fix/segment-ownership-minimal
August 8, 2024 14:45
poodlewars
added a commit
that referenced
this pull request
Aug 9, 2024
## Motivation (this section copied from previous (closed) attempt - #1746) The motivation for the change is to allow `arcticdb-enterprise` to copy blocks to NFS storages without a use-after-move. I explained this in man-group/arcticdb-enterprise#139 but to have an open record: CopyCompressedInterStoreTask has: ``` // Don't bother copying the key segment pair when writing to the final target if (it == std::prev(target_stores_.end())) { (*it)->write_compressed_sync(std::move(key_segment_pair)); } else { auto key_segment_pair_copy = key_segment_pair; (*it)->write_compressed_sync(std::move(key_segment_pair_copy)); } ``` KeySegmentPair has a shared_ptr to a KeySegmentPair, which we can think of here as just a `Segment`. Therefore the old `key_segment_pair_copy` is shallow, the underlying Segment is the same. But the segment eventually gets passed as an rvalue reference further down the stack. In `do_write_impl` we call `put_object` which calls `serialize_header`. This modifies the segment in place and passes that buffer to the AWS SDK. In the `NfsBackedStorage` we have: ``` void NfsBackedStorage::do_write(Composite<KeySegmentPair>&& kvs) { auto enc = kvs.transform([] (auto&& key_seg) { return KeySegmentPair{encode_object_id(key_seg.variant_key()), std::move(key_seg.segment())}; }); s3::detail::do_write_impl(std::move(enc), root_folder_, bucket_name_, *s3_client_, NfsBucketizer{}); } ``` where the segment gets moved from. Subsequent attempts to use the segment (eg copying on to the next store) then fail. man-group/arcticdb-enterprise#139 fixed this issue by cloning the segment, but this approach avoids the (expensive) clone. ## Logical Change Copy the `KeySegmentPair`'s pointer to the `Segment` in `nfs_backed_storage.cpp` rather than moving from the segment. ## Refactor and Testing ### Copy Task Move the CopyCompressedInterStoreTask down to ArcticDB from arcticdb-enterprise. Add a test for it on NFS storage. I've verified that the tests in this commit fail without the refactor in the HEAD~1 commit. The only changes to `CopyCompressedInterstoreTask` from enterprise are: - Pass the `KeySegmentPair` by value in to `write_compressed{_sync}`. The `KeySegmentPair` is cheap to copy (especially considering we are about to copy an object across storages, likely with a network hop). - We have adopted the new `set_key` API of `KeySegmentPair`: ``` if (key_to_write_.has_value()) { key_segment_pair.set_key(*key_to_write_); } ``` - We have namespaced the `ProcessingResult` struct in to the task ### KeySegmentPair - Replace methods returning mutable lvalue references to keys with a `set_key` method. - Remove the `release_segment` method as it dangerously leaves the `KeySegmentPair` pointing at a `Segment` object that has been moved from, and it is not actually necessary. ## Follow up work The non-const `Segment& KeySegmentPair#segment()` API is still dangerous and error prone. I have a follow up change to remove it, but that API change affects very many files and will be best raised separately so that it doesn't block this fix for replication. A draft PR showing a proposal for that change is here - #1757 .
Base automatically changed from
aseaton/fix/segment-ownership-minimal
to
master
August 9, 2024 20:50
poodlewars
force-pushed
the
aseaton/fix-segment-ownership-minimal-more
branch
4 times, most recently
from
August 10, 2024 16:33
2058aef
to
2a255d9
Compare
grusev
pushed a commit
that referenced
this pull request
Nov 25, 2024
## Motivation (this section copied from previous (closed) attempt - #1746) The motivation for the change is to allow `arcticdb-enterprise` to copy blocks to NFS storages without a use-after-move. I explained this in man-group/arcticdb-enterprise#139 but to have an open record: CopyCompressedInterStoreTask has: ``` // Don't bother copying the key segment pair when writing to the final target if (it == std::prev(target_stores_.end())) { (*it)->write_compressed_sync(std::move(key_segment_pair)); } else { auto key_segment_pair_copy = key_segment_pair; (*it)->write_compressed_sync(std::move(key_segment_pair_copy)); } ``` KeySegmentPair has a shared_ptr to a KeySegmentPair, which we can think of here as just a `Segment`. Therefore the old `key_segment_pair_copy` is shallow, the underlying Segment is the same. But the segment eventually gets passed as an rvalue reference further down the stack. In `do_write_impl` we call `put_object` which calls `serialize_header`. This modifies the segment in place and passes that buffer to the AWS SDK. In the `NfsBackedStorage` we have: ``` void NfsBackedStorage::do_write(Composite<KeySegmentPair>&& kvs) { auto enc = kvs.transform([] (auto&& key_seg) { return KeySegmentPair{encode_object_id(key_seg.variant_key()), std::move(key_seg.segment())}; }); s3::detail::do_write_impl(std::move(enc), root_folder_, bucket_name_, *s3_client_, NfsBucketizer{}); } ``` where the segment gets moved from. Subsequent attempts to use the segment (eg copying on to the next store) then fail. man-group/arcticdb-enterprise#139 fixed this issue by cloning the segment, but this approach avoids the (expensive) clone. ## Logical Change Copy the `KeySegmentPair`'s pointer to the `Segment` in `nfs_backed_storage.cpp` rather than moving from the segment. ## Refactor and Testing ### Copy Task Move the CopyCompressedInterStoreTask down to ArcticDB from arcticdb-enterprise. Add a test for it on NFS storage. I've verified that the tests in this commit fail without the refactor in the HEAD~1 commit. The only changes to `CopyCompressedInterstoreTask` from enterprise are: - Pass the `KeySegmentPair` by value in to `write_compressed{_sync}`. The `KeySegmentPair` is cheap to copy (especially considering we are about to copy an object across storages, likely with a network hop). - We have adopted the new `set_key` API of `KeySegmentPair`: ``` if (key_to_write_.has_value()) { key_segment_pair.set_key(*key_to_write_); } ``` - We have namespaced the `ProcessingResult` struct in to the task ### KeySegmentPair - Replace methods returning mutable lvalue references to keys with a `set_key` method. - Remove the `release_segment` method as it dangerously leaves the `KeySegmentPair` pointing at a `Segment` object that has been moved from, and it is not actually necessary. ## Follow up work The non-const `Segment& KeySegmentPair#segment()` API is still dangerous and error prone. I have a follow up change to remove it, but that API change affects very many files and will be best raised separately so that it doesn't block this fix for replication. A draft PR showing a proposal for that change is here - #1757 .
poodlewars
force-pushed
the
aseaton/fix-segment-ownership-minimal-more
branch
2 times, most recently
from
December 16, 2024 15:02
b94d023
to
07ebd1f
Compare
poodlewars
force-pushed
the
aseaton/fix-segment-ownership-minimal-more
branch
from
December 16, 2024 15:56
07ebd1f
to
6980ddc
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A follow up to #1756 .
This change is not essential and we should only merge it if we think it makes it easier to use the
KeySegmentPair
API correctly.The non-const
Segment& KeySegmentPair#segment()
API onKeySegmentPair
is a bit dangerous and error prone, because it isn't obvious that what it lets you modify is actually a resource that might be shared across severalKeySegmentPair
objects.Instead we replace it with
segment_ptr()
andrelease_segment()
APIs that return shared ptrs and unique ptrs, rather than hiding the ownership model from the KeySegmentPair API.Prior discussion https://chat-man.slack.com/archives/C05L3F8ABDF/p1723450739437019