Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api: allow use of UUIDs as component IDs in instance specs #816

Merged
merged 7 commits into from
Dec 18, 2024

Conversation

gjcolombo
Copy link
Contributor

Stacked on #813. Related to #804.

Reviewer note: Most of this change is mechanical conversion of types, but there are some slightly more substantial functional changes to Crucible backend management in server.rs, initializer.rs, and state_driver.rs.


Today instance specs use Strings to identify VM components. In many cases, components are associated with control plane objects (like Disk and NetworkInterface records) that have UUIDs. Allow components to be identified by UUID by defining a SpecKey type that deserializes as a Uuid when possible and as a String when not, then plumb it throughout Propolis. See the comments on the SpecKey type for more details, including notes on why the type is a UUID/String union and not just a UUID.

The main functional change in this PR is that propolis-server's component maps now use SpecKeys to identify components. The idea is to say that if a component has key "Foo" in a VM's spec, then subsequent API calls that want to act on that component should pass "Foo" as the component ID. This (hopefully) simplifies how Crucible-related APIs should designate the disk they're targeting: Omicron should pick an ID for each Crucible backend it requests (which ID can just be the control plane disk ID), then pass the same ID to subsequent snapshot and VCR replacement operations. Control plane disk names no longer appear in the Propolis API at all.

Tests: cargo test, PHD, manual creation/migration of VMs with propolis-cli, manually hit the snapshot and VCR replacement endpoints and verified these dispatched their operations to the correct Crucible backend.

Fixes #776. Fixes #772.

@gjcolombo gjcolombo requested review from hawkw and iximeow November 19, 2024 22:56
@gjcolombo
Copy link
Contributor Author

@leftwo You may want to take a look at the Crucible bits of this one (in server.rs, initializer.rs, and state_driver.rs). Instead of calling Volume::get_uuid to get a disk ID from the VCR to use in the CrucibleBackendMap, Propolis now expects that

  • Omicron will choose a Crucible backend component ID for each Crucible disk when a VM starts
  • This ID will be used as the key to the Crucible backend map
  • Omicron will specify the same ID when it needs to request a snapshot or replace a VCR

I reckon Omicron will just use the disk ID as the Crucible backend ID to make this as smooth as possible in Nexus and sled agent. As far as I can tell, the disk ID is the ID that we were getting from get_uuid before, because that's how Omicron constructs its volume construction requests; this just makes the IDs at the Propolis layer a little more explicit/transparent.

LMK whether this seems like a reasonable approach.

@gjcolombo gjcolombo requested a review from leftwo November 19, 2024 23:01
@hawkw
Copy link
Member

hawkw commented Nov 20, 2024

This seems reasonable to me overall, but I'll defer to the Crucible folks

Comment on lines 671 to 674
// If metrics are enabled and this Crucible backend was
// identified with a UUID-format spec key, register this disk
// for metrics collection, using the key as the disk ID.
if let (Some(registry), SpecKey::Uuid(disk_id)) =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nexus will never tell propolis-server about a Crucible disk with non-UUID ID, right? if so, the SpecKey::Name path here, where there just won't be metrics collected for the disk, shouldn't be reachable except if we've crafted a weird instance request by hand? if this shouldn't ever be reachable, it seems worth rejecting the spec /more precisely typing IDs we know need to be UUID.

i do remember disk names ending up in here somewhere, rather than IDs.. gonna go reread how that plumbing works unless you remember more immediately.

my real concern here is that this is where Nexus had been giving what are now SpecKey::Name, and that we'll lose Crucible metrics here. do you know if we have a test that exercises that?

Copy link
Contributor Author

@gjcolombo gjcolombo Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nexus will never tell propolis-server about a Crucible disk with non-UUID ID, right? if so, the SpecKey::Name path here, where there just won't be metrics collected for the disk, shouldn't be reachable except if we've crafted a weird instance request by hand? if this shouldn't ever be reachable, it seems worth rejecting the spec /more precisely typing IDs we know need to be UUID.

This makes sense. I'm a little leery of it because I think it makes the API a little weird: the ID for a given component can be either a UUID a Name, unless it's this one specific kind of component, in which case it has to be a UUID. This isn't horribly broken or anything, it's just a slightly sharp edge (and we'd also have to account for it in the non-Omicron tools/processes that let users attach ad-hoc Crucible volumes to Propolis servers).

We could address this in a few other ways:

  1. warn! if the spec key isn't a UUID but there's a producer registry available
  2. put a #[cfg(feature = "omicron-build")] guard in that enforces the "Crucible keys must be UUIDs" requirement
  3. go back to calling Volume::get_uuid in this path to get the UUID to use for stats reporting, and warn! if it differs from the spec key

I kind of like option 3 now that I've written it up: it limits our regression risk (at least where metrics are concerned) but keeps the API flexible. WDYT?

i do remember disk names ending up in here somewhere, rather than IDs.. gonna go reread how that plumbing works unless you remember more immediately.

IIRC the disk name wasn't used in this specific path (EDIT: this is incorrect, see below; the disk name was used as a spec key previously, which then used it to construct serial numbers for NVMe devices), but it was used in VCR replacement, which worked like this:

  • Take the VM objects lock
  • Look up the CrucibleStorageBackend component in the instance spec
    • the key for this was derived from the disk name
  • Look up the Crucible backend object in the VM's crucible_backends map
    • the key for this turns out to be the disk ID; the initialization code path used to read it from the Crucible backend's get_uuid function, which would read it out of the disk's volume construction request; Omicron guaranteed that this ID matched the disk ID (by creating the VCR that way)
  • send the replace operation to Crucible
  • on success, write the updated VCR back to the instance spec

my real concern here is that this is where Nexus had been giving what are now SpecKey::Name, and that we'll lose Crucible metrics here. do you know if we have a test that exercises that?

We don't have an end-to-end test for disk metrics, at least AFAIK.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, after re-reading a bit and taking a second look, i'm also a fan of Volume::get_uuid here. i generally like the change to make crucible just Option<Arc<...>> though, so to double-check: you're thinking literally calling crucible.get_uuid() somewhere around this if, yeah? i think that should be pretty clean.

FWIW the get_uuid() call before was context'd and ?, so it would bubble an error up rather than warn. that still seems like an appropriate error behavior here too.

more UUID tracing i haven't done: could backend_id be different from crucible.get_uuid()? i assume this is "technically possible but not really desired"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed in d6602c1.

more UUID tracing i haven't done: could backend_id be different from crucible.get_uuid()? i assume this is "technically possible but not really desired"

That's correct, at least as I understand it:

  • the serialized VolumeConstructionRequest that gets passed to the API is expected to be a VolumeConstructionRequest::Volume (which will have subvolumes of type Region)
  • Volume::get_uuid returns the ID field in that top-level Volume
  • when Nexus constructs the VCR for a new disk, it sets that ID to the disk ID

When I update Omicron to pick up this change, I expect that Nexus/sled-agent will use disk IDs as the spec keys for any CrucibleStorageBackends they specify. That will preserve this relationship so long as Nexus keeps inserting the disk ID into its VolumeConstructionRequest::Volumes the way it does today.

Comment on lines 319 to 338
// TODO(#790): Propolis's NVMe controller used to pad
// serial numbers with 0 bytes by default. This causes
// disk serial numbers to appear in the guest to be
// null-terminated strings. This, in turn, affects (or
// at least may affect) how guest firmware images
// identify disks when determining a VM's boot order: a
// disk whose serial number is padded to 20 bytes with
// spaces may not match a disk whose serial number was
// padded with \0 bytes.
//
// Unfortunately for us, guest firmware may persist disk
// identifiers to a VM's nonvolatile EFI variable store.
// This means that changing from zero-padded to
// space-padded serial numbers may break an existing
// VM's saved boot order.
//
// Until we decide whether and how to preserve
// compatibility for existing VMs that may have
// preserved a zero-padded disk ID, continue to zero-pad
// disk IDs in PHD to match previous behavior.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i initially wondered if you expected to copy this comment to Omicron later, but the corresponding Omicron change will probably be a migration to persist the null-padded serials and switch to space-padded for new disks huh? then at least any old wrong data will be made static, rather than "fixed" without us expecting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up in d6dd7c7. The main thing I wanted to emphasize in this comment is that PHD behaves this way because Omicron will when I create the initial changes to sled-agent to get it to use the spec-based ensure API), and I don't want their behaviors to drift too far apart.

the corresponding Omicron change will probably be a migration to persist the null-padded serials and switch to space-padded for new disks huh?

This is definitely the most straightforward approach, but I don't really like it, because it entails having Nexus have to keep track of the (hopefully small-and-getting-smaller) set of existing disks that need the special "pad with zeroes not spaces" compat quirk. I would be willing to consider some possibly-outlandish options (e.g. trying to do something clever in the boot order management logic in our EDK2 fork) to avoid adding a quirk column/table just for this.

It probably goes without saying, but I intend not to try to solve that problem in the Omicron change that will ingest these Propolis changes--the first order of business is to convert to the new API without actually changing or breaking anything :)

lib/propolis-client/src/support.rs Show resolved Hide resolved
lib/propolis-client/src/support.rs Show resolved Hide resolved
Copy link
Member

@iximeow iximeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for working through this!

Copy link
Contributor

@leftwo leftwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any obvious problems with crucible and this change. It would be good to take it for a test drive and replace a downstairs just to be sure we connected the dots correctly.

bin/propolis-server/src/lib/initializer.rs Show resolved Hide resolved
bin/propolis-server/src/lib/initializer.rs Show resolved Hide resolved
@gjcolombo
Copy link
Contributor Author

Thanks @leftwo for taking a look! Before I merge I'll make sure to spin up a Propolis with this change and recheck that I can properly dispatch operations to the snapshot and VCR replacement APIs. (We'll want to check that that works on the Omicron side as well, but I'll ask about how to do that when the Omicron changes are ready.)

@gjcolombo gjcolombo force-pushed the gjcolombo/one-ensure-api branch from a92ffd7 to 7a9bbc0 Compare December 18, 2024 00:13
Base automatically changed from gjcolombo/one-ensure-api to master December 18, 2024 00:50
Instead of using Strings as keys in instance specs, define a SpecKey
type that is a union of a UUID and a String. This type serializes as a
string, and its `FromStr` and `From<String>` impls will try to parse
strings as UUIDs before storing keys in string format. This allows the
control plane to use UUIDs wherever it makes sense while allowing it
(and other users) to continue to use strings where a UUID component ID
is unavailable or is being used by some other spec element (as might be
the case for a disk device and disk backend).

Index most objects in propolis-server using SpecKeys instead of Strings.
This is mostly just a rote change-the-types, make-it-compile exercise,
but it extends to Crucible backends, which deserve some additional
commentary. In the old ensure API, sending a `DiskRequest` would
register disk components with several different identifiers:

- The disk's name, given by the `DiskRequest`'s `name` field, is used
  - as a key in the VM's `DeviceMap`,
  - as the disk device component's key in the VM's instance spec, and
  - to generate the disk backend component's key in the VM's instance
    spec.
- The disk's ID, given as the `id` in the `Volume`-variant
  `VolumeConstructionRequest` in the `DiskRequest`, is reported by the
  Crucible backend's `get_uuid` function and used
  - to register a metrics producer for each disk, and
  - as the key in the VM's `CrucibleBackendMap`.

Now that all new VMs are started using instance specs, use component
keys for everything:

- Entities in the `DeviceMap` are identified by their corresponding
  components' `SpecKey`s.
- Crucible backends in the `CrucibleBackendMap` are also identified by
  their `SpecKey`s.
- APIs that take a Crucible backend ID on their path now take a String
  and not a UUID. The server converts these to a SpecKey before looking
  up backends in the map.
- When a Crucible backend is created, the machine initializer checks to see
  if its `SpecKey` was a `Uuid`. If so, it will register the backend to
  produce metrics.

The intention is that Omicron will use disk IDs as the keys for its
`CrucibleStorageBackend` components in the specs it generates. It can
then also use these IDs as parameters when requesting snapshots or VCR
replacements. The friendly names users give their Omicron disks no
longer appear in the Propolis API at all.
@gjcolombo
Copy link
Contributor Author

Before I merge I'll make sure to spin up a Propolis with this change and recheck that I can properly dispatch operations to the snapshot and VCR replacement APIs.

I've checked this for both of these endpoints: the snapshot and replace operations indeed get directed to the relevant Crucible downstairs processes (which then summarily reject them because I didn't build with the right features, didn't specify a VCR in the correct format, etc., but at least they're making it to the right places!).

@gjcolombo gjcolombo merged commit d4529fd into master Dec 18, 2024
10 of 11 checks passed
@gjcolombo gjcolombo deleted the gjcolombo/spec-key branch December 18, 2024 18:46
gjcolombo added a commit to oxidecomputer/omicron that referenced this pull request Dec 20, 2024
Update Omicron to use the new Propolis VM creation API defined in
oxidecomputer/propolis#813 and oxidecomputer/propolis#816. This API
requires clients to pass instance specs to create new VMs and component
replacement lists to migrate existing VMs. Construct these in sled agent
for now; in the future this logic can move to Nexus and become part of a
robust virtual platform abstraction. For now the goal is just to keep
everything working for existing VMs while adapting to the new Propolis
API.

Slightly adjust the sled agent instance APIs so that Nexus specifies
disks and boot orderings using sled-agent-defined types and not
re-exported Propolis types.

Finally, adapt Nexus to the fact that Crucible's
`VolumeConstructionRequest` and `CrucibleOpts` types no longer appear in
Propolis's generated client (and so don't appear in sled agent's client
either). Instead, the `propolis-client` crate re-exports these types
directly from its `crucible-client-types` dependency. For the most part,
this involves updating `use` directives and storing `SocketAddr`s in
their natively typed form instead of converting them to and from
strings.

Tests: cargo nextest, plus ad hoc testing in a dev cluster as described
in the PR comments.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

server: machine init is inconsistent about the names it puts in its DeviceMap Prefer component IDs to names
4 participants