Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nexus would like XDE not to automatically create/delete V2P mappings on port creation/deletion #368

Open
gjcolombo opened this issue May 12, 2023 · 0 comments

Comments

@gjcolombo
Copy link

Currently XDE creates and deletes OPTE virtual-to-physical mapping information for a guest IP when a caller asks to create or destroy an XDE port with that IP. In an Omicron-managed environment, this happens whenever sled agent creates a VMM on a sled (the VMM initialization process creates a port for each of the instance's NICs so that it can pass the ports to Propolis to use as network backends) and whenever a Propolis VMM shuts down (Propolis VM shutdown triggers a sled agent destruction sequence that destroys all of the VMM's associated ports).

This behavior combines with the (WIP) Nexus live migration protocol in ways that make it somewhat challenging for Nexus to ensure that all sleds have the latest mappings for a migrating instance. Some of the possible races are described in the fold below.

It would be lovely, from the perspective of keeping our control plane synchronization as simple as possible, if Nexus and sled agent could assume that (or ask to be able to assume that) they fully control the V2P mappings on each sled and that mappings won't be created or destroyed without an explicit command from the control plane.

The gory details

Nexus's LM protocol generally tries to avoid monitoring or tightly synchronizing with ongoing migration work on any particular sled. That is, the migration saga does just enough to initiate migration and then exits; when the migration succeeds or fails, the sled agents involved push instance runtime state updates that indicate which sled the instance ended up running on, with no additional coordination with Nexus (or the other sled agent!) required. This is nice for robustness (it minimizes the number of parties who have to send messages successfully in order for migration to succeed) but meshes with XDE's creation/deletion of ports in interesting ways:

  • If an instance starts migrating and fails, Nexus has to make sure to push "instance is on S" mappings to sled T even though the instance didn't ultimately move (because merely creating a VMM on T created "instance is on T" mappings for its VIPs).
  • Nexus would like to propagate "instance is on T" mappings as soon as it learns that an instance has successfully migrated, because until it does, other instances in the same VPC might not be able to reach it. This creates a race, though:
    • Instance migrates from S to T
    • Nexus pushes "instance is on T" mappings to S
    • The source VMM on S shuts down, destroying all the XDE ports
    • S deletes the mappings associated with the instance's VIPs, but it should keep the updated mappings instead

This latter problem is particularly thorny because Nexus would really like to avoid waiting for an extra message from the source (that might be delayed, or indeed never arrive at all) before starting to propagate mappings for an instance that it knows has successfully migrated.

There's a case to be made for having Nexus propagate V2P mappings using a reliable persistent workflow that knows how to correct itself if it applies stale mappings. (That is: RPW task reads that an instance is on sled A at generation 4; it starts propagating mappings; meanwhile the instance moves to sled B at generation 5; before exiting, the task notices the generation/location change and does another lap to make sure the right mappings are propagated.) The main downside here is that, depending on when the task runs and what it observes when, it can temporarily destroy otherwise-correct mappings and damage connectivity to an instance. (The LM protocol otherwise avoids this problem by ensuring that a migrating instance can't migrate again until any configuration affected by the migration has been updated.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant