Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix #340: remove mmap buffer, require msgpack==0.5.0 #1433

Merged
merged 5 commits into from
Apr 2, 2024

Conversation

jamesmunro
Copy link
Collaborator

@jamesmunro jamesmunro commented Mar 17, 2024

  • remove mmap buffer
  • require msgpack==0.5.0 (we need strict_types)
  • consolidate msgpack usage into _msgpack_compact
  • add metadata size warning at 8MB (max is 16MB)

Reference Issues/PRs

Fixes #340

What does this implement or fix?

We've been allocating a 4GB buffer for every data pack operation (not metadata).
There are a few issues with this,

  • It's a very large overallocation in lots of cases, so inefficient
  • It can fail even for small data, this was most common on windows builds

This change also improves some other tightly coupled aspects of out msgpacking.

  • Consolidating msgpack usage into _msgpack_compact
  • Creates padded (mod 8 bytes) version of packb so that we can store in uint64 column without extra copies, and removes bugs with original padding approach.
  • Fixes ambiguity between list and tuple storage and require strict_types. Previously we where relying on calls to packb both when strict_types argument wasn't available, and when the strict_types check failed, they where both caught by TypeError.
  • Stop throwing ArcticDbNotYetImplemented when packed data is over 4GB, and instead let MemoryError propagate from msgpack when it actually hits the limit (which is similar).
  • The 16MB limit on metadata will give users a sudden hard to recover error when they hit it, so introduce a warning level at 8MB, to encourage users away from storing any more in metadata.

Checklist

Checklist for code changes...
  • Have you updated the relevant docstrings, documentation and copyright notice?
  • Is this contribution tested against all ArcticDB's features?
  • Do all exceptions introduced raise appropriate error messages?
  • Are API changes highlighted in the PR description?
  • Is the PR labelled as enhancement or bug so it appears in autogenerated release notes?

@jamesmunro jamesmunro force-pushed the remove-mmap-buffer branch 2 times, most recently from 597a28f to c1928ce Compare March 17, 2024 14:45
@jamesmunro jamesmunro marked this pull request as ready for review March 18, 2024 08:05
setup.cfg Outdated Show resolved Hide resolved
python/arcticdb/_msgpack_compat.py Outdated Show resolved Hide resolved
python/arcticdb/_msgpack_compat.py Show resolved Hide resolved
python/arcticdb/_msgpack_compat.py Outdated Show resolved Hide resolved
python/tests/unit/arcticdb/test_msgpack_compact.py Outdated Show resolved Hide resolved
python/tests/unit/arcticdb/test_msgpack_compact.py Outdated Show resolved Hide resolved
python/arcticdb/version_store/_normalization.py Outdated Show resolved Hide resolved
@jamesmunro jamesmunro self-assigned this Mar 22, 2024
@jamesmunro jamesmunro requested a review from alexowens90 March 22, 2024 17:14
@jamesmunro jamesmunro merged commit 5358a8b into master Apr 2, 2024
113 of 114 checks passed
@jamesmunro jamesmunro deleted the remove-mmap-buffer branch April 2, 2024 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Writing even very small pickled data can fail on low memory machines
3 participants