-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
serde: add default deserializers implementations #233
serde: add default deserializers implementations #233
Conversation
@@ -113,17 +197,16 @@ impl<T: Serializable> Serializable for &Vec<T> { | |||
} | |||
} | |||
|
|||
impl<T: Serializable, const N: usize> Serializable for Vec<[T; N]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a blanket implementation for impl<T, C> Serializable for [T; C]
. Together with the existing blanked implementation for Vec<T>
, &Vec<T>
, and &[T]
, it made these 3 implementations obsolete.
494c444
to
5971f6a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Looks good! I left a few small comments inline.
utils/core/src/serde/mod.rs
Outdated
impl Serializable for String { | ||
fn write_into<W: ByteWriter>(&self, target: &mut W) { | ||
let source = flatten_slice_elements(self); | ||
T::write_batch_into(source, target); | ||
target.write_u64(self.len() as u64); | ||
target.write_bytes(self.as_bytes()); | ||
} | ||
} | ||
|
||
impl<T: Serializable, const N: usize> Serializable for &Vec<[T; N]> { | ||
impl Serializable for &String { | ||
fn write_into<W: ByteWriter>(&self, target: &mut W) { | ||
let source = flatten_slice_elements(self); | ||
T::write_batch_into(source, target); | ||
(*self).write_into(target) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This deviates somewhat from how we serialize lists in that we try not to make assumptions data types used to serialize the number of elements. I think we can leave it up to the callers to figure out how exactly they want to serialize a given string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the problem. You don't want these implementations? You don't want to include the string length?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the issue is with the use of as_bytes
? You don't want to encode the data as utf8
?
The biggest issue here is that the user can not implement this trait, because of the orphan rules. So here we either have to make the choice or not give an implementation to the user.
When the user is writing their own Serializable
, they can also deviate from this by just not calling it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to remove these implementations as serializing strings without length is probably not a good idea, and serializing them with length is not consistent with the overall approach (e.g., different from how we serialize slices, vectors etc.).
I don't think this affects the usability too much as the user can serialize strings manually pretty easily - e.g.,:
let x = "hello world".to_string();
target.write_u64(x.len() as u64);
target.write_bytes(x.as_bytes());
On the deserialization front, I think we can improve the ergonomics by adding read_string()
method to ByteReader
interface. It could look something like:
fn read_string(&mut self, len: usize) -> Result<String, DeserializationError> {
let data = source.read_vec(len)?;
String::from_utf8(data).map_err(|err| DeserializationError::InvalidValue(format!("{err}")))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to remove these implementations as serializing strings without length is probably not a good idea, and serializing them with length is not consistent with the overall approach (e.g., different from how we serialize slices, vectors etc.).
The question is, does it make sense to serialize slices and vectors without length too? And if this is done for consistency sake, why can't the String be serialized without the length too?
I don't think this affects the usability too much as the user can serialize strings manually pretty easily - e.g.,:
Your call
On the deserialization front, I think we can improve the ergonomics by adding read_string() method to ByteReader interface. It could look something like:
This can be done in another PR.
Removed the default implementation
utils/core/src/serde/mod.rs
Outdated
impl Deserializable for String { | ||
fn read_from<R: ByteReader>(source: &mut R) -> Result<Self, DeserializationError> { | ||
let length = source | ||
.read_u64()? | ||
.try_into() | ||
.map_err(|err| DeserializationError::InvalidValue(format!("{err}",)))?; | ||
let data = source.read_vec(length)?; | ||
|
||
String::from_utf8(data).map_err(|err| DeserializationError::InvalidValue(format!("{err}"))) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment to one of the previous comments re strings: I'd rather not make opinionated choices about which data type to use for serializing string lengths.
let data: Vec<T> = T::read_batch_from(source, C)?; | ||
|
||
// SAFETY: the call above only returns a Vec if there are `C` elements, this conversion | ||
// always succeeds | ||
let res = data.try_into().unwrap_or_else(|v: Vec<T>| { | ||
panic!("Expected a Vec of length {} but it was {}", C, v.len()) | ||
}); | ||
|
||
Ok(res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a bit unfortunate that we have to allocate a vector here to deserialize the data - but I wonder if the compiler is smart enough to figure out that this is not needed.
5971f6a
to
696cb07
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Thank you!
The
Deserializer
trait does not have an associated type, all methods returnSelf
. That means it is not possible to have a blanket implementation for slices&[T]
, because the signature is required to befn read_from<R: ByteReader>(source: &mut R) -> Result<&[T], DeserializationError>;
. I can't add a associated type without making a breaking change because default associated types are unstable https://github.com/rust-lang/rfcs/blob/master/text/2532-associated-type-defaults.md. I also can't define a blanket implementation forVec<T>
because the default serializer does not include the length of the vector.edit: also note:
usize
intou8
there is nothing to do besides panicking, this could be an attack vector.write_batch_into
is the same as the vec implementation, should probably be removed since it doesn't have a clear receiver and there is equivalent functionality