Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurartion for a graceful shutdown period for tests manually aborted with Ctrl+C or other signals #931

Closed
Veetaha opened this issue Aug 13, 2023 · 7 comments

Comments

@Veetaha
Copy link

Veetaha commented Aug 13, 2023

I have a use case where my tests have a really heavy setup and teardown logic. This logic happens within the test functions themselves. Each test function individually runs a terraform apply at the begging of the tests and terraform destroy deploying various resources to the AWS cloud temporarily.

terraform and AWS cloud are quite heavy-weight. It may take the from 1 to 10 minutes to perform the setup and maybe 1-5 minutes to teardown and cleanup the resources.

Right now cargo-nextest sends a SIGKILL to the test process leaving it no chance to gracefully terminate. It would solve this problem if cargo-nextest could allow configuring the behavior of graceful shutdown for tests. For example, if I could say in the config file that "these tests" selected by the DSL (in my case I don't need it, I need to select them all, but I think it may be useful for other use cases) need to be shutdown gracefully. Then configure the UNIX signal that will be sent to the tests, and the timeout for the tests to exit by themselves, after which cargo nextests will send a SIGKILL.

@sunshowers
Copy link
Member

Hi there -- please see https://nexte.st/book/slow-tests.html#how-nextest-terminates-tests. On encountering a timeout, nextest sends a SIGTERM, waits 10 seconds, and then sends a SIGKILL. You can customize the timeout with the grace-period setting.

The initial signal is not configurable, but I hope SIGTERM would work for you. If it doesn't work for you could you explain why? If there's a solid use case for a non-SIGTERM signal I'd be happy to accept patch contributions.

@Veetaha
Copy link
Author

Veetaha commented Aug 14, 2023

I think I missed to mention the main thing in the issue. The problem here is not with the tests that time out. The problem is with the following scenario.

Given I already have tests in the process of running.
I suddenly want to abort the tests.
I press Ctrl+C on the terminal.
I expect that my tests receive a SIGINT and they gracefully shut down themselves and perform cleanup of the resources they created for themselves.

Right now cargo-nextest always sends SIGKILL to the tests when I abort it with Ctrl+C, but I want cargo-nextest to send SIGINT to my tests instead.

Why SIGINT? It's because this is the signal that tokio listens for in its tokio::signal::ctrl_c() function, which is a cross-platform tool to listen for interrupt signals. SIGTERM on the other hand requires writing some unix-specific code to handle this type of signal.

Anyway, I don't think it's critical to be able to configure the signal which is sent, it's still possible to write unix-only code for handling the SIGTERM.

My tests don't need to run on windows, but if the process were killed in a way that tokio::signal::ctrl_c() could also catch that on windows, it would be cool as well.

@Veetaha Veetaha changed the title Make it possible to configure a graceful shutdown period for tests Configurartion for a graceful shutdown period for tests manually aborted with Ctrl+C or other signals Aug 14, 2023
@sunshowers
Copy link
Member

sunshowers commented Aug 14, 2023

Hmm that's weird -- no, tests should get a SIGINT, not a SIGKILL, if you ctrl-c them. Do you have a reproduction?

@Veetaha
Copy link
Author

Veetaha commented Aug 14, 2023

I figured out the following.


If I send SIGINT by pressing Ctrl+C on the terminal, then cargo-nextest gives exactly 10 seconds to my test process before killing it. I couldn't find a way to configure a value greater than 10 seconds for this.

Here is the rust code I used:

Details
#[tokio::test]
async fn sigabox() {
    let teardown_duration = std::time::Duration::from_secs(15);

    eprintln!("Waiting for SIGINT (Ctrl-C)...");

    tokio::signal::ctrl_c().await.unwrap();

    eprintln!("SIGINT received, tearing down...");

    tokio::time::sleep(teardown_duration).await;

    eprintln!("Teardown succesful, bye bye!");
}
[package]
edition = "2021"
name    = "nextest-sigint-repro"
version = "0.1.0"

[dependencies]
tokio = { version = "1.22", features = ["macros", "rt-multi-thread", "signal", "time"] }

I started the tests and immediately (within 1 second) pressed a Ctrl+C.
Here is the output I saw in the terminal.

$ cargo nextest run --no-capture
    Finished test [unoptimized + debuginfo] target(s) in 0.04s
    Starting 1 test across 1 binary
       START             nextest-sigint-repro sigabox

running 1 test
Waiting for SIGINT (Ctrl-C)...
^C   Canceling due to interrupt: 1 tests still running
SIGINT received, tearing down...
     SIGKILL [  10.377s] nextest-sigint-repro sigabox
------------
     Summary [  10.377s] 1 test run: 0 passed, 1 failed, 0 skipped
error: test run failed  

I also tried this config, but it didn't influence the period that nextest gave the tests to tear down, the grace period was still 10 seconds for Ctrl+C

[profile.default]
slow-timeout = { period = "60s", terminate-after = 1, grace-period = "20s" }

If I send SIGTERM to cargo-nextest it sends SIGKILL to the test and immediately exists.

I didn't expect that SIGTERM is handled in a different way than SIGINT by cargo-nextest.

@sunshowers
Copy link
Member

OK:

  • I think a custom grace period should probably also apply to SIGINT/ctrl-c. This suggests that grace-period should live outside slow-timeout and be a top-level option.
  • Sending a SIGTERM to nextest should send SIGTERM to test processes and then apply the grace period, not SIGKILL.

I'm currently sick and otherwise very busy, so probably won't have time to fix either of these any time soon. If you're interested in working on it I'd love it if you submitted a PR or two to fix it. Thanks!

@erratic-pattern
Copy link
Contributor

+1 on this feature request. I have some slow cleanup actions that get performed on SIGINT, where the hard-coded 10 second grace period is too quick. I was hoping slow-timeout.grace-period would also configure the SIGINT grace period, but no such luck.

I looked at the source code and think I could probably make this change here in the next few weeks when I have some capacity.

One question remains for me: should slow-timeout.grace-period be kept for compatibility? or are we okay with moving forward with a breaking change to the config (introducing grace-period at the top-level and removing it from slow-timeout)

@erratic-pattern
Copy link
Contributor

Opened a PR for this #1039 .

I went with shutdown-grace-period as the name for the new top-level config, but I'm open to suggestions on what to call it instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants