Inspect timing-out aead tests #548

th4s · 2024-07-29T10:04:24Z

PR #547 comments the aead tests, which are currently timing out. This needs further investigation as it is not locally reproducible.

th4s · 2024-08-13T16:41:28Z

PR #558 is used for investigating.

th4s · 2024-08-21T08:42:07Z

This deadlock is hard to fix. Things I tried so far:

I could not reproduce this bug on my local machine, even when using act and running it in docker.
Deadlock seems to vanish when executing a single AEAD test. It is strange that either all 4 of the started aead tests hang or all pass. I never observed some test passing and another hanging.
Looking for static variables, i.e. shared state between unit tests -> nothing found
Manual code inspection, especially of mpz-garble/deap-> nothing found so far.
Replacing all .lock().unwrap() with try_lock().unwrap() -> deadlock vanishes: Deadlock vanish due to trylock th4s/tlsn#5
Adding tracing.
- There seems to be a problem with set_default in that it omits some logging statements. Couldn't make any sense out of it, will probably create an issue for this. However sometimes I was able to capture a little bit of output. This can be found in the failed runs in these PRs
- When trying to add proper tracing via set_global_default the deadlock vanishes.
Removing tracing and using println! statements the deadlock vanishes -> Deadlock vanish due to prints th4s/tlsn#4
Trying to use Tokio's task dump feature -> does not work because the future running does not support it. You get an empty log.
Using shuttle testing locally to trigger the deadlock. -> all tests pass

Saying the deadlock vanishes means that I tried to run it several times (4-5) and didn't encounter it.

th4s · 2024-08-22T11:13:42Z

Making some progress by carefully adding println! statements. This run is interesting: https://github.com/th4s/tlsn/actions/runs/10506339441/job/29106915720
It hangs in decrypt_private -> verify_tag -> compute_tag_share -> share_keystream_block -> compute -> execute

It does not seem to hang in setup_assigned_values.

th4s · 2024-08-22T12:05:06Z

New hanging run with more logging: https://github.com/th4s/tlsn/actions/runs/10507218244/job/29108495136
It hangs inside generate (also in evaluate ~~but this is probably just a consequence of waiting for generate~~).

~~My current bet is that it has to do with feed, flush and the limited backpressure of the TestSTExecutor.~~

th4s · 2024-08-22T15:16:49Z

New hanging run with more fine-grained logging: https://github.com/th4s/tlsn/actions/runs/10509969553/job/29118455831

Seems to be hanging here: https://github.com/privacy-scaling-explorations/mpz/blob/7783045420272defb0caf52c0b7ceeab03badb55/crates/mpz-garble/src/generator/mod.rs#L309-L318

th4s · 2024-08-23T10:19:52Z

This run https://github.com/th4s/tlsn/actions/runs/10523156760 provides evidence that it hangs on the evaluator side here:
https://github.com/privacy-scaling-explorations/mpz/blob/8a6ae620fc67176111935a8b1ac8b10c63d8e253/crates/mpz-garble/src/evaluator/mod.rs#L393-L470

th4s · 2024-08-23T15:33:29Z

When running tests from the workspace root, the feature rayon although not specified, is activated by tlsn-verifier and tlsn-prover. This means the aead tests are run with feature rayon enabled.

th4s · 2024-08-26T18:21:48Z

What happens is the following:

When running our test-suite from the workspace root, tlsn-verifier/tlsn-prover activate rayon for mpz-common. This means that although not specified, tlsn-aead unit tests are run with this feature enabled.
When running unit tests, Rust uses 4 threads to parallelize test execution. But in each aead test the leader AND and the follower make use of rayon::spawn.
Rayon uses std::thread::available_parallelism to estimate the amount of parallelism (when the rayon threadpool is left unconfigured as we do), i.e. how many tasks are run in parallel in the rayon threadpool. I determined this to be 4 for our github CI.
All the 4 rayon-spawned evaluator tasks use expect_next in evaluate to get garbling batches from the generator. This means that when all 4 evaluator tasks start before a single generator can be run, the rayon threadpool is saturated. The generator can only queue their rayon tasks to send garbling batches but they are never executed because the evaluator tasks will never finish (since they are waiting for garbling batches) -> deadlock

sinui0 · 2024-08-26T18:44:41Z

🫠

good work. Short term fix is to increase thread count, or decrease parallelism?

th4s · 2024-08-27T07:53:50Z

Yeah I think I am leaning towards increasing the thread count for the aead unit tests.

heeckhau · 2024-09-10T14:37:08Z

Fixed in #575

th4s mentioned this issue Aug 27, 2024

Fix aead deadlock in unit tests #575

Merged

th4s linked a pull request Aug 27, 2024 that will close this issue

Fix aead deadlock in unit tests #575

Merged

heeckhau closed this as completed Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inspect timing-out aead tests #548

Inspect timing-out aead tests #548

th4s commented Jul 29, 2024

th4s commented Aug 13, 2024

th4s commented Aug 21, 2024 •

edited

Loading

th4s commented Aug 22, 2024 •

edited

Loading

th4s commented Aug 22, 2024 •

edited

Loading

th4s commented Aug 22, 2024

th4s commented Aug 23, 2024

th4s commented Aug 23, 2024 •

edited

Loading

th4s commented Aug 26, 2024 •

edited

Loading

sinui0 commented Aug 26, 2024

th4s commented Aug 27, 2024

heeckhau commented Sep 10, 2024

Inspect timing-out aead tests #548

Inspect timing-out aead tests #548

Comments

th4s commented Jul 29, 2024

th4s commented Aug 13, 2024

th4s commented Aug 21, 2024 • edited Loading

th4s commented Aug 22, 2024 • edited Loading

th4s commented Aug 22, 2024 • edited Loading

th4s commented Aug 22, 2024

th4s commented Aug 23, 2024

th4s commented Aug 23, 2024 • edited Loading

th4s commented Aug 26, 2024 • edited Loading

sinui0 commented Aug 26, 2024

th4s commented Aug 27, 2024

heeckhau commented Sep 10, 2024

th4s commented Aug 21, 2024 •

edited

Loading

th4s commented Aug 22, 2024 •

edited

Loading

th4s commented Aug 22, 2024 •

edited

Loading

th4s commented Aug 23, 2024 •

edited

Loading

th4s commented Aug 26, 2024 •

edited

Loading