-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refine error messages about not having a propolis address #7263
base: main
Are you sure you want to change the base?
Conversation
return Err(Error::invalid_request( | ||
"Exactly one of 'from_start' \ | ||
or 'most_recent' must be specified.", | ||
)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't super love this, and maybe @hawkw's new dropshot error type stuff can help? this reproduces a check that's done in both Propolis and propolis-sim. when they see invalid parameters, they respond with a 400 whose message is the same as here. my thinking is, the connection to Propolis could error for other reasons, and for any other reason it would be correctly categorized as a 500. and i didn't want to just forward any 400 error from Propolis out to users here because Propolis errors aren't necessarily intended for direct end-user consumption...
if parameter validation gets more strict on the Propolis side, though, things checked there and not here will be 500s..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, hm. I think once #7196 we could have Propolis return a structured error here. But, since we are constructing the request in Nexus, I think it's reasonable-ish to say that any invalid request sent to Propolis is arguably an "internal server error" and we should be validating it here beforehand. On the other hand, duplicating the validation also feels bad. I dunno...
Err(Error::invalid_request(format!( | ||
"cannot connect to serial console of instance in state \"{}\"", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i tried to make this a stronger error type than Error
here, so we could have nice "serial console" errors for the serial console path, and "propolis" errors for the region{, snapshot}_replacement
saga path. but this runs into a bunch of new wrinkles like, instance_lookup.lookup_for()
can fail in an Error
-y way, db_datastore.instance_fetch_with_vmm()
can fail in an Error
-y way.
so after spinning a little too long on error plumbing here, i decided to take the different approach of picking the lowest common denominator message that makes sense for any caller of propolis_addr_for_instance
. maybe "cannot connect to admin socket" or something would be more accurate, but i don't want to be confusing in a user-facing error message - there's no secret backdoor socket in your instance! we're just talking to the VMM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think just making it generic is fine. 's fair. a slightly different wording is something like "this operation cannot be performed on instances in state ...", which is also generic but suggests that the specific thing you tried to do can't be done in that state. I worry a little that "administer an instance" is broad enough that it does include operations you can perform in these states --- i.e., is deleting an instance "administering" it?
the chronology here seems to be: Nexus learned to talk directly to Propolis to communicate with an instance's serial console, that "instance to Propolis address" translation got outlined into `propolis_addr_for_instance`, then for region and region snapshot replacment sagas we needed to also talk to the relevant Propolis. at that point, `propolis_client_for_instance` used `propolis_addr_for_instance` too. this is all fine, but `propolis_addr_for_instance` kept a few serial console-specific error messages, which would show up in places like region replacement if an instance is stopped or shut down at an inopportune time. in the process, i discovered that a request for an instance's serial console history with incorrect parameters manifests as a 500 rather than a 400. Propolis even returns a 400, but the failure to establish a websocket console is presumed an internal error in Nexus, and so it's wrapped as a 500 whose contents talk about a 400! fix that to just return a 400.
365b3f0
to
1214a47
Compare
the chronology here seems to be: Nexus learned to talk directly to Propolis to communicate with an instance's serial console, that "instance to Propolis address" translation got outlined into
propolis_addr_for_instance
, then for region and region snapshot replacment sagas we needed to also talk to the relevant Propolis. at that point,propolis_client_for_instance
usedpropolis_addr_for_instance
too.this is all fine, but
propolis_addr_for_instance
kept a few serial console-specific error messages, which would show up in places like region replacement if an instance is stopped or shut down at an inopportune time.in the process, i discovered that a request for an instance's serial console history with incorrect parameters manifests as a 500 rather than a 400. Propolis even returns a 400, but the failure to establish a websocket console is presumed an internal error in Nexus, and so it's wrapped as a 500 whose contents talk about a 400! fix that to just return a 400.