-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add size hint to serialization path #582
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have shown results from the serialize_lz4_for_iai
benchmark, but I can't find it. Could you point me to it or tell what it does?
// 1i32 for the tag, 3i32 for additional bytes. This optimizes | ||
// for the likely case that bytes will rarely be empty | ||
2 * size_of::<i32>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment mentions (1 + 3) * i32, but the implementation returns a size for (1 + 1) * i32.
@@ -372,6 +423,10 @@ impl Value for BigInt { | |||
|
|||
Ok(()) | |||
} | |||
fn size_hint() -> usize { | |||
// Internally the smallest BigInt is [u64; 2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on the choice of the size? The Rust BigInt
type represents varint
in the cql spec, and the size_of (which I guess you meant by "internal size") has nothing to do with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BigInt
as implemented by num_bigint
is internally just a [u64;2]
// Size, number of keys, assume not empty | ||
4 * size_of::<i32>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate on this? One i32
for the serialized size, one i32
for the element count, but what about the remaining 2 * i32
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would imagine that that's optimizing for the assumption of non-emptiness.
@@ -620,6 +732,12 @@ impl Value for CqlValue { | |||
} | |||
} | |||
|
|||
// utility macro | |||
macro_rules! _count { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Underscore at the beginning of the name is usually used only for names that you are going to ignore later. Please align the name with Rust conventions.
$( | ||
result.add_value(&self.$FieldI) ?; | ||
)* | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: unnecessary whitespace change
@@ -639,6 +757,8 @@ macro_rules! impl_value_for_tuple { | |||
|
|||
Ok(()) | |||
} | |||
|
|||
fn size_hint() -> usize { size_of::<i32>() + _count!($($FieldI)*) * size_of::<i32>() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about invoking size_hint
for each variant of the tuple and returning the sum?
What is the motivation behind making As far as I understand, making it a method would allow hints to be more precise, e.g. it would be possible to take |
@colin-grapl - can you respond to the review comments? |
Hey, sorry, not long after this PR the company I worked for was dissolved. I'm no longer working on anything Scylla related. I apologize for any reviewer time that may have been wasted on this, but I'd also be happy to hand off anything related to this work. I'll do some responses here based on what I recall, at least, so that if someone does want to pick this up they have the option to do so. |
https://github.com/bheisler/iai It was an iai benchmark that I no longer have access to. |
This is a mostly straightforward change to
Value
serialization. Previously, buffers used in serialiazation were preallocating capacity based on the size of a value, not the number of bytes it will use when serialized. This is described in two issues:I've fixed this by adding a new method to
Value
, size_hint. It's documented in the code, which I'll inline here:I've then implemented this method on a bunch of Value impls. The ranges tend to be either exact or optimistic. For example,
i8
can be exact:In other cases we know a lower bound but may optimize for slightly above that lower bound:
In the case of
&str
the true lower bound issize_of::<i32>()
, but typically strings aren't empty, and once you allocate 4 bytes it's reasonable to just allocate 16 and handle what is probably the majority case.I only used this method in ValueList, basically. I'm not sure if it should be somewhere else too to preallocate.
The results are alright.
Ultimately I think that all of these individual allocations can be removed and, instead, a single buffer could be used. But this is a relatively small, non-breaking change, and it improves memory usage.
Pre-review checklist
(idk what Fixes annotations are, but)
Fixes: #579
Fixes:
annotations to PR description.