-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cleanup partially imported disks on CTRL+C #760
Conversation
29f03c8
to
4891399
Compare
Recording: disk_cleanup_demo.mov |
Function `test_get_disk_size` is tagged with the `#[test]` attribute, but not in a module configured for tests, so it is not as part of our test suite. Move it into a new `tests` submodule.
4891399
to
e81bfe6
Compare
If a `disk import` request is interrupted then the disk is likely to be left in a `ImportReady` or `ImportingFromBulkWrites` state, which prevents the user from deleting it. The commonly occurs when users interrupt the request using ^C, and is a frustrating experience. Add a new `GracefulShutdown` struct to listen for SIGINTs. This will cancel the executing `Future` on interrupt and run a cleanup task which will return an error, even if cleanup succeeds. A second SIGINT causes the process to exit immediately. Hook calls make in the critical section of the import into the new system to ensure we delete the partially imported disk. This introduces a race where the user may cancel the import operation while the disk creation saga is still executing and the disk is in `Creating` state. Attempting to finalize or delete the disk in this state will fail, so we are forced to wait until it has transitioned to `ImportReady` before proceeding. To avoid spamming the API too heavily, we poll the disk state every 500ms for no more than ten tries. The new polling creates a problem for our testing, though. We need the initial disk API query to return a 404, but subsequent ones to find the disk. `httpmock` does not provide a way to remove a mock after N hits, and we have no graceful way of replacing the mock while running the `oxide` binary. There is an open issue[0] to add an option like this to `httpmock`, but for right now we have no option but to delete the tests that check for failure. Closes #651 [0] alexliesenfeld/httpmock#96
Opened alexliesenfeld/httpmock#108 to add a |
e81bfe6
to
1ebaa1f
Compare
cli/src/cmd_disk.rs
Outdated
|
||
if shutdown.shutdown_requested()? { | ||
self.remove_failed_disk(client).await?; | ||
bail!("user canceled request"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to say something more concrete like "User canceled image upload"
I had been moving the disk upload functionality to the SDK. This seems to complicate that. I had imagined we would do cleanup from a custom From a scan through your changes, it seems like there may be obstacles in the API such as needing to wait for particular state transitions. There may be situations where we want to change the API or state machine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes look good to me, though Adam's review should supersede this one. One comment from me:
|
||
if force_shutdown { | ||
eprintln_nopipe!("Shutting down immediately.\n{exit_msg}."); | ||
std::process::exit(130); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did this value come from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is matching the exit code bash
would use had the standard SIGINT
handler been invoked. It's not exactly the same behavior as WIFSIGNALED
won't be aware of the signal, but I think it's close enough for our purposes.
@ahl I'm not sure where that leaves this PR. Should I close it, or do we expect the work to update the API to take long enough it's worthwhile merging this as a stopgap. |
@wfchandler I was hoping we might change the approach here to do cleanup when we drop the future. Do you think that's viable? |
@ahl I think the latest revision of this PR should be compatible with your SDK changes. We can replace self.upload_disk with the equivalent SDK command when that's available. WDYT? |
Ran into this today. Would be nice to figure out how to get this in. In the meantime I think I’m going to add stop and and finalize to the web console. |
I have almost finished a rewrite of this that adds a convenience wrapper to the SDK, hopefully ready to go later today. |
Closing in favor of #840 |
If a
disk import
request is interrupted then the disk is likely to beleft in a
ImportReady
orImportingFromBulkWrites
state, whichprevents the user from deleting it. The commonly occurs when users
interrupt the request using ^C, and is a frustrating experience.
Add a new
GracefulShutdown
struct to listen for SIGINTs. This willcancel the executing
Future
on interrupt and run a cleanup task whichwill return an error, even if cleanup succeeds. A second SIGINT causes
the process to exit immediately.
Hook calls make in the critical section of the import into the new
system to ensure we delete the partially imported disk.
This introduces a race where the user may cancel the import operation
while the disk creation saga is still executing and the disk is in
Creating
state. Attempting to finalize or delete the disk in thisstate will fail, so we are forced to wait until it has transitioned to
ImportReady
before proceeding. To avoid spamming the API too heavily,we poll the disk state every 500ms for no more than ten tries.
The new polling creates a problem for our testing, though. We need the
initial disk API query to return a 404, but subsequent ones to find the
disk.
httpmock
does not provide a way to remove a mock after N hits,and we have no graceful way of replacing the mock while running the
oxide
binary. There is an open issue to add an option like this tohttpmock
, but for right now we have no option but to delete the teststhat check for failure.
Closes #651