Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symbol list #393

Merged
merged 3 commits into from
Jun 8, 2023
Merged

Symbol list #393

merged 3 commits into from
Jun 8, 2023

Conversation

qc00
Copy link
Contributor

@qc00 qc00 commented May 17, 2023

You might want to review commit by commit


Refactoring for #30

  • Make s3_store_factory reuse_name work
  • Remove inefficient SegmentInMemory Row visit_* methods
  • Hooked StorageFailureSimulator into InMemoryStore
  • Implemented prefix matching in InMemoryStore
  • Error code tidy up
  • Clarified reload_symbol_list()
  • Additional C++ interop facilities

#30 Simpler and more reliable SymbolList logic

@qc00 qc00 added the enhancement New feature or request label May 17, 2023
@qc00 qc00 self-assigned this May 17, 2023
@qc00 qc00 force-pushed the symbol_list branch 3 times, most recently from a5569f5 to a90d09e Compare May 17, 2023 21:33
@mehertz
Copy link
Collaborator

mehertz commented May 18, 2023

This is a large PR - prior to publishing as a full PR please update the description with details on the included changes.

@qc00 qc00 force-pushed the symbol_list branch 2 times, most recently from d5bf4cc to ace7567 Compare May 22, 2023 12:23
@qc00 qc00 marked this pull request as ready for review May 22, 2023 12:29
cpp/arcticdb/version/symbol_list.cpp Outdated Show resolved Hide resolved
SymbolList::CollectionType SymbolList::compact(std::shared_ptr<Store> store, const std::vector<AtomKey>& symbol_keys) {
SymbolList::CollectionType SymbolList::load(const std::shared_ptr<Store>& store, bool no_compaction) {
// Steps 0-1
const LoadResult load_result = ExponentialBackoff<std::runtime_error>(100, 2000)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually want to catch runtime_error? What error conditions is it legitimate to ignore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two classes of issues to consider:

  • Some Storage don't throw the correct exception type in all cases
  • We will need to check if symbol list code itself need this

Given this implementation has been validated internally, I am more inclined to release this first and then improve.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should fix the storages. It's absolutely essential to be able to distinguish between the situation where we have successfully determined that an object does not exist, and the one where we've failed to connect to the storage or some other error has occurred.

Copy link
Contributor Author

@qc00 qc00 May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to fix the storage when we did this change internally, but it involves many layers of storage abstraction, so requires significant effort to check and test all the downstream code.

This symbol list change has been verified internally and is known to improve things, so I want it decoupled from any potential change that can break things. I have raised #447.

folly::Future<std::vector<Store::RemoveKeyResultType>>
SymbolList::delete_keys(const std::shared_ptr<Store>& store, KeyVector&& to_remove, const AtomKey& exclude) {
std::vector<VariantKey> variant_keys;
variant_keys.reserve(to_remove.size());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we making a complete copy of all the keys except one to remove one of them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the Atom->Variant Key conversion and filtering folded into one. I've done my best with std::move below.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't you just use remove_if which will move and doesn't need to allocate any more memory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure if I am missing something, but the to_remove is a KeyVector=std::vector<entity::AtomKey>. We need std::vector<VariantKey> to pass to the store->remove_keys, so my implementation combines the type conversion and the exclusion in one loop for efficiency.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to make the types consistent. VariantKey is a later addition, but generally speaking we are converging on using it everywhere to avoid this sort of copying conversion. But I guess we can leave it for a future change

cpp/arcticdb/version/symbol_list.cpp Outdated Show resolved Hide resolved
Forces the symbol list cache to be reloaded
Forces the symbol list cache to be reloaded.

This can take a long time on large libraries or certain S3 implementations, and once started, it cannot be
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once any of the __add__ or __del__ keys are deleted, the symbol list cache will be incomplete and must be re-built from the version keys.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that happens at the end, doesn't it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is scary.

If it is interrupted, is it safe to run this again?

Copy link
Contributor Author

@qc00 qc00 May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is interrupted, is it safe to run this again?

Yes, and it must be run again.

A more reliable solution is to delete the compaction key first (if any). This will invalidate the entire cache even if any of the __add__ or __del__ keys remains.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raised #446 for a solution.

Copy link
Contributor Author

@qc00 qc00 May 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that happens at the end, doesn't it?

@willdealtry Could you clarify what "that" refers to?

If you meant the re-calculation of the symbol list at the end, the weakness starts in the deletion step before that. If the reload_symbol_list call is interrupted in the deletion step, then any use of the symbol list cache from anywhere might return incomplete results.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, it's this thing. Pretty sure I wrote it originally as two separate methods for exactly this reason.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the name, it's too bland for what the function actually does, which is nuke the whole thing and rebuild from scratch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deletion part is exposed via PyBind, but not on any higher-level objects.

I feel once the symbol list has been made stabled, we should remove this method and expose the deletion-only method. I have added a comment to #446

cpp/arcticdb/version/test/test_symbol_list.cpp Outdated Show resolved Hide resolved
@@ -279,7 +277,8 @@ def create_test_lmdb_cfg(lib_name: str, db_dir: str, lmdb_config: Dict[str, Any]
return cfg


def create_test_s3_cfg(lib_name, credential_name, credential_key, bucket_name, endpoint):
def create_test_s3_cfg(lib_name, credential_name, credential_key, bucket_name, endpoint, *, with_prefix=True):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're adding typing, might as well add the types inline surely?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline typing will cause the line to wrap, which is less readable, IMO. In reality, all tools support both formats.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should settle on one and use it. If there are two many parameters to a function then that's a code smell that should be addressed, e.g. by a Parameter Object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix

Forces the symbol list cache to be reloaded
Forces the symbol list cache to be reloaded.

This can take a long time on large libraries or certain S3 implementations, and once started, it cannot be
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is scary.

If it is interrupted, is it safe to run this again?

Forces the symbol list cache to be reloaded
Forces the symbol list cache to be reloaded.

This can take a long time on large libraries or certain S3 implementations, and once started, it cannot be
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that happens at the end, doesn't it?

SymbolList::CollectionType SymbolList::compact(std::shared_ptr<Store> store, const std::vector<AtomKey>& symbol_keys) {
SymbolList::CollectionType SymbolList::load(const std::shared_ptr<Store>& store, bool no_compaction) {
// Steps 0-1
const LoadResult load_result = ExponentialBackoff<std::runtime_error>(100, 2000)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should fix the storages. It's absolutely essential to be able to distinguish between the situation where we have successfully determined that an object does not exist, and the one where we've failed to connect to the storage or some other error has occurred.

folly::Future<std::vector<Store::RemoveKeyResultType>>
SymbolList::delete_keys(const std::shared_ptr<Store>& store, KeyVector&& to_remove, const AtomKey& exclude) {
std::vector<VariantKey> variant_keys;
variant_keys.reserve(to_remove.size());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't you just use remove_if which will move and doesn't need to allocate any more memory

@@ -6,6 +6,7 @@
*/

#include <gtest/gtest.h>
#include <gmock/gmock.h>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the inclusion of a completely different testing framework (even on that is contained in an existing one) is somethign we should maybe talk about before code review. Does the syntactic sugar for one row justify the additional project-wide dependency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why Google put the matcher functions in a gmock, but those two projects have been merged into one repo at least. I can narrow the include to gmock-matchers.h if it helps?

cpp/arcticdb/storage/failure_simulation.hpp Outdated Show resolved Hide resolved
@qc00 qc00 force-pushed the symbol_list branch 5 times, most recently from 0f6d2ae to 5bd1244 Compare June 7, 2023 14:56
qc00 added 3 commits June 8, 2023 00:15
* Make s3_store_factory reuse_name work
* Remove inefficient SegmentInMemory Row visit_* methods
* Hooked StorageFailureSimulator into InMemoryStore
* Implemented prefix matching in InMemoryStore
* Error code tidy up
* Clarified reload_symbol_list()
* Additional C++ interop facilities
+ Minimum fix for version map swallowing exceptions (#365)
+ Fix off-by-one error in submit_tasks_for_range() exception case
* Removed duplication in CMakePresets conda stuff
* Made the inheritance logic in CMakePresets clear
* Explicit set VCPKG_ROOT and add our vcpkg directory to PATH due to changes in Visual Studio 17.6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants