Skip to content

Commit

Permalink
add initial rkyv support (#1135)
Browse files Browse the repository at this point in the history
This PR adds initial support for [rkyv] to puffin. In particular,
the main aim here is to make puffin-client's `SimpleMetadata` type
possible to deserialize from a `&[u8]` without doing any copies. This
PR **stops short of actuallying doing that zero-copy deserialization**.
Instead, this PR is about adding the necessary trait impls to a variety
of types, along with a smattering of small refactorings to make rkyv
possible to use.

For those unfamiliar, rkyv works via the interplay of three traits:
`Archive`, `Serialize` and `Deserialize`. The usual flow of things is
this:

* Make a type `T` implement `Archive`, `Serialize` and `Deserialize`.
rkyv
helpfully provides `derive` macros to make this pretty painless in most
  cases.
* The process of implementing `Archive` for `T` *usually* creates an
entirely
new distinct type within the same namespace. One can refer to this type
without naming it explicitly via `Archived<T>` (where `Archived` is a
clever
  type alias defined by rkyv).
* Serialization happens from `T` to (conceptually) a `Vec<u8>`. The
serialization format is specifically designed to reflect the in-memory
layout
  of `Archived<T>`. Notably, *not* `T`. But `Archived<T>`.
* One can then get an `Archived<T>` with no copying (albeit, we will
likely
need to incur some cost for validation) from the previously created
`&[u8]`.
This is quite literally [implemented as a pointer cast][rkyv-ptr-cast].
* The problem with an `Archived<T>` is that it isn't your `T`. It's
something
  else. And while there is limited interoperability between a `T` and an
`Archived<T>`, the main issue is that the surrounding code generally
demands
a `T` and not an `Archived<T>`. **This is at the heart of the tension
for
  introducing zero-copy deserialization, and this is mostly an intrinsic
problem to the technique and not an rkyv-specific issue.** For this
reason,
  given an `Archived<T>`, one can get a `T` back via an explicit
deserialization step. This step is like any other kind of
deserialization,
although generally faster since no real "parsing" is required. But it
will
  allocate and create all necessary objects.

This PR largely proceeds by deriving the three aforementioned traits
for `SimpleMetadata`. And, of course, all of its type dependencies. But
we stop there for now.

The main issue with carrying this work forward so that rkyv is actually
used to deserialize a `SimpleMetadata` is figuring out how to deal
with `DataWithCachePolicy` inside of the cached client. Ideally, this
type would itself have rkyv support, but adding it is difficult. The
main difficulty lay in the fact that its `CachePolicy` type is opaque,
not easily constructable and is internally the tip of the iceberg of
a rat's nest of types found in more crates such as `http`. While one
"dumb"-but-annoying approach would be to fork both of those crates
and add rkyv trait impls to all necessary types, it is my belief that
this is the wrong approach. What we'd *like* to do is not just use
rkyv to deserialize a `DataWithCachePolicy`, but we'd actually like to
get an `Archived<DataWithCachePolicy>` and make actual decisions used
the archived type directly. Doing that will require some work to make
`Archived<DataWithCachePolicy>` directly useful.

My suspicion is that, after doing the above, we may want to mush
forward with a similar approach for `SimpleMetadata`. That is, we want
`Archived<SimpleMetadata>` to be as useful as possible. But right
now, the structure of the code demands an eager conversion (and thus
deserialization) into a `SimpleMetadata` and then into a `VersionMap`.
Getting rid of that eagerness is, I think, the next step after dealing
with `DataWithCachePolicy` to unlock bigger wins here.

There are many commits in this PR, but most are tiny. I still encourage
review to happen commit-by-commit.

[rkyv]: https://rkyv.org/
[rkyv-ptr-cast]:
https://docs.rs/rkyv/latest/src/rkyv/util/mod.rs.html#63-68
  • Loading branch information
BurntSushi authored Jan 28, 2024
1 parent c0e7668 commit 5219d37
Show file tree
Hide file tree
Showing 33 changed files with 782 additions and 204 deletions.
155 changes: 153 additions & 2 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ regex = { version = "1.10.2" }
reqwest = { version = "0.11.23", default-features = false, features = ["json", "gzip", "brotli", "stream", "rustls-tls"] }
reqwest-middleware = { version = "0.2.4" }
reqwest-retry = { version = "0.3.0" }
rkyv = { version = "0.7.43", features = ["strict", "validation"] }
rmp-serde = { version = "1.1.2" }
rustc-hash = { version = "1.1.0" }
same-file = { version = "1.0.6" }
Expand Down
4 changes: 4 additions & 0 deletions crates/distribution-filename/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,15 @@ license = { workspace = true }
[lints]
workspace = true

[features]
rkyv = ["dep:rkyv", "pep440_rs/rkyv"]

[dependencies]
pep440_rs = { path = "../pep440-rs" }
platform-tags = { path = "../platform-tags" }
puffin-normalize = { path = "../puffin-normalize" }

rkyv = { workspace = true, features = ["strict", "validation"], optional = true }
serde = { workspace = true, optional = true }
thiserror = { workspace = true }
url = { workspace = true }
Expand Down
12 changes: 12 additions & 0 deletions crates/distribution-filename/src/source_dist.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,12 @@ use puffin_normalize::{InvalidNameError, PackageName};

#[derive(Clone, Debug, PartialEq, Eq)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[cfg_attr(
feature = "rkyv",
derive(rkyv::Archive, rkyv::Deserialize, rkyv::Serialize)
)]
#[cfg_attr(feature = "rkyv", archive(check_bytes))]
#[cfg_attr(feature = "rkyv", archive_attr(derive(Debug)))]
pub enum SourceDistExtension {
Zip,
TarGz,
Expand Down Expand Up @@ -52,6 +58,12 @@ impl SourceDistExtension {
/// need the latter.
#[derive(Clone, Debug, PartialEq, Eq)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[cfg_attr(
feature = "rkyv",
derive(rkyv::Archive, rkyv::Deserialize, rkyv::Serialize)
)]
#[cfg_attr(feature = "rkyv", archive(check_bytes))]
#[cfg_attr(feature = "rkyv", archive_attr(derive(Debug)))]
pub struct SourceDistFilename {
pub name: PackageName,
pub version: Version,
Expand Down
6 changes: 6 additions & 0 deletions crates/distribution-filename/src/wheel.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,12 @@ use platform_tags::{TagPriority, Tags};
use puffin_normalize::{InvalidNameError, PackageName};

#[derive(Debug, Clone, Eq, PartialEq, Hash)]
#[cfg_attr(
feature = "rkyv",
derive(rkyv::Archive, rkyv::Deserialize, rkyv::Serialize)
)]
#[cfg_attr(feature = "rkyv", archive(check_bytes))]
#[cfg_attr(feature = "rkyv", archive_attr(derive(Debug)))]
pub struct WheelFilename {
pub name: PackageName,
pub version: Version,
Expand Down
2 changes: 1 addition & 1 deletion crates/distribution-types/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,10 @@ puffin-normalize = { path = "../puffin-normalize" }
pypi-types = { path = "../pypi-types" }

anyhow = { workspace = true }
chrono = { workspace = true, features = ["serde"] }
data-encoding = { workspace = true }
fs-err = { workspace = true }
once_cell = { workspace = true }
rkyv = { workspace = true, features = ["strict", "validation"] }
rustc-hash = { workspace = true }
serde = { workspace = true, features = ["derive"] }
serde_json = { workspace = true }
Expand Down
Loading

0 comments on commit 5219d37

Please sign in to comment.