Add size hint to serialization path #582

colin-grapl · 2022-10-21T20:09:13Z

This is a mostly straightforward change to Value serialization. Previously, buffers used in serialiazation were preallocating capacity based on the size of a value, not the number of bytes it will use when serialized. This is described in two issues:

I've fixed this by adding a new method to Value, size_hint. It's documented in the code, which I'll inline here:

pub trait Value {
    fn serialize(&self, buf: &mut Vec<u8>) -> Result<(), ValueTooBig>;
    /// A *hint* to callers indicating how much memory the serialized
    /// form of this `Value` will take. This hint is not defined as a
    /// lower bound, upper bound, nor is an exact size. Every implementation
    /// is free to return the "best guess" available.
    /// The default impl returns `std::mem::size_of::<i32>()` as every Value
    /// at minimum has an i32 sized tag.
    fn size_hint() -> usize {
        std::mem::size_of::<i32>()
    }
}

I've then implemented this method on a bunch of Value impls. The ranges tend to be either exact or optimistic. For example, i8 can be exact:

impl Value for i8 {
    fn serialize(&self, buf: &mut Vec<u8>) -> Result<(), ValueTooBig> { /*...*/ }

    fn size_hint() -> usize {
        size_of::<i32>() + size_of::<Self>()
    }
}

In other cases we know a lower bound but may optimize for slightly above that lower bound:

impl Value for &str {
    fn serialize(&self, buf: &mut Vec<u8>) -> Result<(), ValueTooBig> { /*...*/ }
    fn size_hint() -> usize {
        // 1i32 for the tag, 3i32 for additional characters. This optimizes
        // for the likely case that strings will rarely be empty, and likely
        // be at least a few characters
        4 * size_of::<i32>()
    }
}

In the case of &str the true lower bound is size_of::<i32>(), but typically strings aren't empty, and once you allocate 4 bytes it's reasonable to just allocate 16 and handle what is probably the majority case.

I only used this method in ValueList, basically. I'm not sure if it should be somewhere else too to preallocate.

The results are alright.

main
serialize_lz4_for_iai
  Instructions:               10459
  L1 Accesses:                13456
  L2 Accesses:                   32
  RAM Accesses:                 156
  Estimated Cycles:           19076

fix
serialize_lz4_for_iai
  Instructions:                9838 (-5.937470%)
  L1 Accesses:                12605 (-6.324316%)
  L2 Accesses:                   32 (No change)
  RAM Accesses:                 157 (+0.641026%)
  Estimated Cycles:           18260 (-4.277626%)

Ultimately I think that all of these individual allocations can be removed and, instead, a single buffer could be used. But this is a relatively small, non-breaking change, and it improves memory usage.

Pre-review checklist

(idk what Fixes annotations are, but)

Fixes: #579

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
I added relevant tests for new features and bug fixes.
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
[?] I added appropriate Fixes: annotations to PR description.

Rebase

piodul

You have shown results from the serialize_lz4_for_iai benchmark, but I can't find it. Could you point me to it or tell what it does?

piodul · 2022-10-31T09:46:06Z

scylla-cql/src/frame/value.rs

+        // 1i32 for the tag, 3i32 for additional bytes. This optimizes
+        // for the likely case that bytes will rarely be empty
+        2 * size_of::<i32>()


The comment mentions (1 + 3) * i32, but the implementation returns a size for (1 + 1) * i32.

piodul · 2022-10-31T09:59:53Z

scylla-cql/src/frame/value.rs

@@ -372,6 +423,10 @@ impl Value for BigInt {

        Ok(())
    }
+    fn size_hint() -> usize {
+        // Internally the smallest BigInt is [u64; 2]


Could you elaborate on the choice of the size? The Rust BigInt type represents varint in the cql spec, and the size_of (which I guess you meant by "internal size") has nothing to do with it.

BigInt as implemented by num_bigint is internally just a [u64;2]

piodul · 2022-10-31T11:01:32Z

scylla-cql/src/frame/value.rs

+        // Size, number of keys, assume not empty
+        4 * size_of::<i32>()


Could you elaborate on this? One i32 for the serialized size, one i32 for the element count, but what about the remaining 2 * i32?

I would imagine that that's optimizing for the assumption of non-emptiness.

piodul · 2022-10-31T11:06:01Z

scylla-cql/src/frame/value.rs

@@ -620,6 +732,12 @@ impl Value for CqlValue {
    }
 }

+// utility macro
+macro_rules! _count {


Underscore at the beginning of the name is usually used only for names that you are going to ignore later. Please align the name with Rust conventions.

piodul · 2022-10-31T11:14:28Z

scylla-cql/src/frame/value.rs

                $(
                    result.add_value(&self.$FieldI) ?;
                )*
+


Nit: unnecessary whitespace change

piodul · 2022-10-31T11:15:25Z

scylla-cql/src/frame/value.rs

@@ -639,6 +757,8 @@ macro_rules! impl_value_for_tuple {

                Ok(())
            }
+
+            fn size_hint() -> usize { size_of::<i32>() + _count!($($FieldI)*) * size_of::<i32>() }


How about invoking size_hint for each variant of the tuple and returning the sum?

havaker · 2022-11-02T14:44:09Z

What is the motivation behind making size_hint an associated function instead of a method?

As far as I understand, making it a method would allow hints to be more precise, e.g. it would be possible to take &str lenght into account when computing its size hint.

mykaul · 2023-01-31T11:52:30Z

@colin-grapl - can you respond to the review comments?

insanitybit · 2023-06-26T16:54:42Z

Hey, sorry, not long after this PR the company I worked for was dissolved. I'm no longer working on anything Scylla related. I apologize for any reviewer time that may have been wasted on this, but I'd also be happy to hand off anything related to this work. I'll do some responses here based on what I recall, at least, so that if someone does want to pick this up they have the option to do so.

insanitybit · 2023-06-26T17:01:16Z

You have shown results from the serialize_lz4_for_iai benchmark, but I can't find it. Could you point me to it or tell what it does?

https://github.com/bheisler/iai

It was an iai benchmark that I no longer have access to.

colin-grapl and others added 3 commits October 21, 2022 12:15

Merge pull request #1 from scylladb/main

21e079b

Rebase

Implement and leverage size_hint

635cf4b

Preallocate data buffer

7fbefe9

piodul reviewed Oct 31, 2022

View reviewed changes

piodul mentioned this pull request Aug 25, 2023

Serialization refactor: add new serialization traits #801

Closed

wprzytula added this to the 1.0.0 milestone Jun 20, 2024

wprzytula added the performance Improves performance of existing features label Jun 20, 2024

Lorak-mmk modified the milestones: 1.0.0, 1.x.0 Dec 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add size hint to serialization path #582

Add size hint to serialization path #582

colin-grapl commented Oct 21, 2022 •

edited

Loading

piodul left a comment

piodul Oct 31, 2022

piodul Oct 31, 2022

insanitybit Jun 26, 2023

piodul Oct 31, 2022

insanitybit Jun 26, 2023

piodul Oct 31, 2022

piodul Oct 31, 2022

piodul Oct 31, 2022

havaker commented Nov 2, 2022

mykaul commented Jan 31, 2023

insanitybit commented Jun 26, 2023 •

edited

Loading

insanitybit commented Jun 26, 2023

		// Size, number of keys, assume not empty
		4 * size_of::<i32>()

Add size hint to serialization path #582

Are you sure you want to change the base?

Add size hint to serialization path #582

Conversation

colin-grapl commented Oct 21, 2022 • edited Loading

Pre-review checklist

piodul left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

havaker commented Nov 2, 2022

mykaul commented Jan 31, 2023

insanitybit commented Jun 26, 2023 • edited Loading

insanitybit commented Jun 26, 2023

colin-grapl commented Oct 21, 2022 •

edited

Loading

insanitybit commented Jun 26, 2023 •

edited

Loading