Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More changes for segment ownership, get rid of mutable segment() #1757

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

poodlewars
Copy link
Collaborator

@poodlewars poodlewars commented Aug 8, 2024

A follow up to #1756 .

This change is not essential and we should only merge it if we think it makes it easier to use the KeySegmentPair API correctly.

The non-const Segment& KeySegmentPair#segment() API on KeySegmentPair is a bit dangerous and error prone, because it isn't obvious that what it lets you modify is actually a resource that might be shared across several KeySegmentPair objects.

Instead we replace it with segment_ptr() and release_segment() APIs that return shared ptrs and unique ptrs, rather than hiding the ownership model from the KeySegmentPair API.

Prior discussion https://chat-man.slack.com/archives/C05L3F8ABDF/p1723450739437019

@poodlewars poodlewars force-pushed the aseaton/fix-segment-ownership-minimal-more branch from 15f6521 to ff4a391 Compare August 8, 2024 14:43
@poodlewars poodlewars changed the base branch from master to aseaton/fix/segment-ownership-minimal August 8, 2024 14:45
poodlewars added a commit that referenced this pull request Aug 9, 2024
## Motivation

(this section copied from previous (closed) attempt -
#1746)

The motivation for the change is to allow `arcticdb-enterprise` to copy
blocks to NFS storages without a use-after-move. I explained this in
man-group/arcticdb-enterprise#139 but to have an
open record:

CopyCompressedInterStoreTask has:

```
                        // Don't bother copying the key segment pair when writing to the final target
                        if (it == std::prev(target_stores_.end())) {
                            (*it)->write_compressed_sync(std::move(key_segment_pair));
                        } else {
                            auto key_segment_pair_copy = key_segment_pair;
                            (*it)->write_compressed_sync(std::move(key_segment_pair_copy));
                        }
```

KeySegmentPair has a shared_ptr to a KeySegmentPair, which we can think
of here as just a `Segment`.

Therefore the old `key_segment_pair_copy` is shallow, the underlying
Segment is the same. But the segment eventually gets passed as an rvalue
reference further down the stack.

In `do_write_impl` we call `put_object` which calls `serialize_header`.

This modifies the segment in place and passes that buffer to the AWS
SDK.

In the `NfsBackedStorage` we have:

```
void NfsBackedStorage::do_write(Composite<KeySegmentPair>&& kvs) {
    auto enc = kvs.transform([] (auto&& key_seg) {
      return KeySegmentPair{encode_object_id(key_seg.variant_key()), std::move(key_seg.segment())};
    });
    s3::detail::do_write_impl(std::move(enc), root_folder_, bucket_name_, *s3_client_, NfsBucketizer{});
}
```

where the segment gets moved from.

Subsequent attempts to use the segment (eg copying on to the next store)
then fail.

man-group/arcticdb-enterprise#139 fixed this
issue by cloning the segment, but this approach avoids the (expensive)
clone.

## Logical Change

Copy the `KeySegmentPair`'s pointer to the `Segment` in
`nfs_backed_storage.cpp` rather than moving from the segment.

## Refactor and Testing

### Copy Task

Move the CopyCompressedInterStoreTask down to ArcticDB from
arcticdb-enterprise. Add a test for it on NFS storage. I've verified
that the tests in this commit fail without the refactor in the HEAD~1
commit. The only changes to `CopyCompressedInterstoreTask` from
enterprise are:

- Pass the `KeySegmentPair` by value in to `write_compressed{_sync}`.
The `KeySegmentPair` is cheap to copy (especially considering we are
about to copy an object across storages, likely with a network hop).
- We have adopted the new `set_key` API of `KeySegmentPair`:
```
if (key_to_write_.has_value()) {
                key_segment_pair.set_key(*key_to_write_);
            }
```
- We have namespaced the `ProcessingResult` struct in to the task

### KeySegmentPair

- Replace methods returning mutable lvalue references to keys with a
`set_key` method.
- Remove the `release_segment` method as it dangerously leaves the
`KeySegmentPair` pointing at a `Segment` object that has been moved
from, and it is not actually necessary.

## Follow up work

The non-const `Segment& KeySegmentPair#segment()` API is still dangerous
and error prone. I have a follow up change to remove it, but that API
change affects very many files and will be best raised separately so
that it doesn't block this fix for replication.

A draft PR showing a proposal for that change is here -
#1757 .
Base automatically changed from aseaton/fix/segment-ownership-minimal to master August 9, 2024 20:50
@poodlewars poodlewars force-pushed the aseaton/fix-segment-ownership-minimal-more branch 4 times, most recently from 2058aef to 2a255d9 Compare August 10, 2024 16:33
@poodlewars poodlewars marked this pull request as ready for review August 12, 2024 08:13
grusev pushed a commit that referenced this pull request Nov 25, 2024
## Motivation

(this section copied from previous (closed) attempt -
#1746)

The motivation for the change is to allow `arcticdb-enterprise` to copy
blocks to NFS storages without a use-after-move. I explained this in
man-group/arcticdb-enterprise#139 but to have an
open record:

CopyCompressedInterStoreTask has:

```
                        // Don't bother copying the key segment pair when writing to the final target
                        if (it == std::prev(target_stores_.end())) {
                            (*it)->write_compressed_sync(std::move(key_segment_pair));
                        } else {
                            auto key_segment_pair_copy = key_segment_pair;
                            (*it)->write_compressed_sync(std::move(key_segment_pair_copy));
                        }
```

KeySegmentPair has a shared_ptr to a KeySegmentPair, which we can think
of here as just a `Segment`.

Therefore the old `key_segment_pair_copy` is shallow, the underlying
Segment is the same. But the segment eventually gets passed as an rvalue
reference further down the stack.

In `do_write_impl` we call `put_object` which calls `serialize_header`.

This modifies the segment in place and passes that buffer to the AWS
SDK.

In the `NfsBackedStorage` we have:

```
void NfsBackedStorage::do_write(Composite<KeySegmentPair>&& kvs) {
    auto enc = kvs.transform([] (auto&& key_seg) {
      return KeySegmentPair{encode_object_id(key_seg.variant_key()), std::move(key_seg.segment())};
    });
    s3::detail::do_write_impl(std::move(enc), root_folder_, bucket_name_, *s3_client_, NfsBucketizer{});
}
```

where the segment gets moved from.

Subsequent attempts to use the segment (eg copying on to the next store)
then fail.

man-group/arcticdb-enterprise#139 fixed this
issue by cloning the segment, but this approach avoids the (expensive)
clone.

## Logical Change

Copy the `KeySegmentPair`'s pointer to the `Segment` in
`nfs_backed_storage.cpp` rather than moving from the segment.

## Refactor and Testing

### Copy Task

Move the CopyCompressedInterStoreTask down to ArcticDB from
arcticdb-enterprise. Add a test for it on NFS storage. I've verified
that the tests in this commit fail without the refactor in the HEAD~1
commit. The only changes to `CopyCompressedInterstoreTask` from
enterprise are:

- Pass the `KeySegmentPair` by value in to `write_compressed{_sync}`.
The `KeySegmentPair` is cheap to copy (especially considering we are
about to copy an object across storages, likely with a network hop).
- We have adopted the new `set_key` API of `KeySegmentPair`:
```
if (key_to_write_.has_value()) {
                key_segment_pair.set_key(*key_to_write_);
            }
```
- We have namespaced the `ProcessingResult` struct in to the task

### KeySegmentPair

- Replace methods returning mutable lvalue references to keys with a
`set_key` method.
- Remove the `release_segment` method as it dangerously leaves the
`KeySegmentPair` pointing at a `Segment` object that has been moved
from, and it is not actually necessary.

## Follow up work

The non-const `Segment& KeySegmentPair#segment()` API is still dangerous
and error prone. I have a follow up change to remove it, but that API
change affects very many files and will be best raised separately so
that it doesn't block this fix for replication.

A draft PR showing a proposal for that change is here -
#1757 .
@poodlewars poodlewars force-pushed the aseaton/fix-segment-ownership-minimal-more branch 2 times, most recently from b94d023 to 07ebd1f Compare December 16, 2024 15:02
@poodlewars poodlewars force-pushed the aseaton/fix-segment-ownership-minimal-more branch from 07ebd1f to 6980ddc Compare December 16, 2024 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants