Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update links to API code about hard-coded allowed transitions #2387

Merged
merged 4 commits into from
Aug 20, 2024

Conversation

david-crespo
Copy link
Collaborator

Closes #2334

Copy link

vercel bot commented Aug 19, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
console ✅ Ready (Inspect) Visit Preview Aug 20, 2024 4:59pm

app/api/util.ts Outdated

// TODO: ask greg about this
// reboot and stop are kind of weird! in theory you can do it from a bunch of different states
// https://github.com/oxidecomputer/omicron/blob/6dd9802/nexus/src/app/instance.rs#L790-L798
reboot: ['running'],
stop: ['running', 'starting'],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @gjcolombo — Charlie and I are double-checking the set of instance states where reboot and stop are allowed and were wondering how to interpret this function.

https://github.com/oxidecomputer/omicron/blob/6dd9802/nexus/src/app/instance.rs#L790-L798

I see there's different logic depending on whether a VMM is found, but I don't know how to think about that from a user point of view. I see that there a bunch of states where you should never fail to find a VMM.

Copy link

@gjcolombo gjcolombo Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That match is pretty spooky, especially in light of some recent improvements to Nexus (omicron#5749 and omicron#5854). I filed oxidecomputer/omicron#6390 to see if we can clean it up a bit.

For purposes of this PR I almost think you want to link both to Nexus and to this bit of Propolis: https://github.com/oxidecomputer/propolis/blob/b278193/bin/propolis-server/src/lib/vm/request_queue.rs

The idea underlying the code you linked is that when an instance has a running Propolis--roughly between when you successfully ask to start the instance and the time that Propolis comes back and says it's Destroyed--the Propolis process is the source of truth as to what state changes can and can't be accepted. The match is saying, "OK, I have this request to change the instance's state, and I have a Propolis to talk to; given what I know about the state that Propolis last reported, should I even bother forwarding it this request to get its disposition?"

For requests to run, stop, and reboot, the answer is basically always "let Propolis decide what to do." (The main edge case is Migrating: which Propolis gets the request then? I think the simplest answer is "neither, the request should be rejected until we know where to send it"; omicron#6390 tracks this.)

The migration-request side of the match could also use some tweaking, but the external API doesn't allow explicit requests to migrate anymore (omicron#6311), so that's probably not as much of a concern here.

On the console side, I think the logic you have for these states (right side lines 102/103) is right, though we could also allow a request to stop a rebooting instance (Propolis will queue it), or even reboot a rebooting instance (Propolis will queue the second reboot to execute when the first is done; note though that once you've queued one request to reboot, Propolis will silently drop new ones until the queued request begins to be processed).

Hope this helps a bit--LMK what else I can clarify or fill you in on.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's perfect, thanks. Generally I'm ok with being slightly more restrictive than what's technically possible, though in the UI we sometimes say "an instance can only be rebooted in X states", and in that case I'd prefer to be accurate since we're educating the user there about the system. So I think I'll add rebooting to the lists.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, nope, nope. Too damn weird. Will leave rebooting out, but thanks for the link, and I'm linking to this comment as well.

Screenshot 2024-08-20 at 11 49 13 AM

Screenshot 2024-08-20 at 11 49 21 AM

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically correct: the best kind of correct!

(Yeah, that looks pretty silly. Leaving rebooting off the lists is the right move.)

@david-crespo david-crespo marked this pull request as ready for review August 20, 2024 16:59
@david-crespo david-crespo enabled auto-merge (squash) August 20, 2024 16:59
@david-crespo david-crespo merged commit f53bb38 into main Aug 20, 2024
8 checks passed
@david-crespo david-crespo deleted the instance-can-citations branch August 20, 2024 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update instanceCan logic for API changes
2 participants