Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mirage-crypto-rng.unix thread safety? #249

Open
edwintorok opened this issue Oct 25, 2024 · 13 comments · May be fixed by #250
Open

mirage-crypto-rng.unix thread safety? #249

edwintorok opened this issue Oct 25, 2024 · 13 comments · May be fixed by #250

Comments

@edwintorok
Copy link
Contributor

I don't see any warnings about thread safety in mirage-crypto-rng, but looking at the implementation:

rng/fortuna.ml
27:(* XXX Locking!! *)

And there various changes to mutable fields without locks, so I think there is a chance that running Mirage_crypto_rng.generate would generate the same random numbers in 2 different threads, which would be quite bad (or at least I can't prove that it wouldn't).

If the library is not meant to be thread safe, please add a warning to the .mli

This may not be a problem for mirage (which uses Lwt, so typically single-OCaml-threaded, and not thread safe), but is a problem for multi-threaded OCaml code that would want to use Mirage_crypto_rng.

@dinosaure
Copy link
Member

Hi, currently, only the miou layer of mirage-crypto{,-rng} is domain safe. We made the Pfortuna module which uses a lock mechanism to be sure the exclusivity to get values from the rng (which, in the case of miou, run on one and unique domain).

You can find (interesting) discussions on this subject in this PR: #227

I agree with you that we should improve the documentation regarding mirage-crypto-rng and OCaml 5 - there is a question but we have not, at this stage, found any really satisfactory solutions between OCaml 4, OCaml 5, unikernels (where threads.cmxa cannot be included for OCaml 4) and applications.

@dinosaure
Copy link
Member

I agree with you that we should improve the documentation regarding mirage-crypto-rng and OCaml 5 - there is a question but we have not, at this stage, found any really satisfactory solutions between OCaml 4, OCaml 5, unikernels (where threads.cmxa cannot be included for OCaml 4) and applications.

A possible solution, a bit more ‘handy’, would be to rewrite Pfortuna but with the Mutex/Condition that OCaml 5 offers and to initialise the RNG with such a module. In this case, I suspect that mirage-crypto-rng could be domain-safe but we'd have to confirm with TSan.

@edwintorok
Copy link
Contributor Author

mirage-crypto-rng and OCaml 5 -

thread safety is a concern on OCaml 4 too, not just OCaml 5 (on OCaml 5 it'll be just a lot more obvious, because if something wasn't thread safe on OCaml 4, it likely won't be domain safe on OCaml 5 either).

@hannesm
Copy link
Member

hannesm commented Oct 25, 2024

Another possibility for -rng and -lwt is to ditch the fortuna, and use getrandom/getentropy directly.

That can be done for async and eio and miou as well.

Fortuna, and why we use it, is mainly for MirageOS where we don't have getrandom. And earlier (when mirage-crypto/nocrypto started) getrandom wasn't available. WDYT?

The only issue I can think of is that code used in fewer environments breaks sooner (or a breakage is discovered only late).

@edwintorok
Copy link
Contributor Author

edwintorok commented Oct 25, 2024

d use getrandom/getentropy directly.

That will likely be quite slow if we release the runtime lock every time. Although since 'getrandom' is not mean to perform IO I think we could get away with not releasing the runtime lock, and then the syscall should be quite fast.

But isn't getrandom Linux specific, does it exist on other platforms like the BSDs, I don't think it is POSIX?

@hannesm
Copy link
Member

hannesm commented Oct 25, 2024

hmm, I see what you mean. Indeed, getentropy is POSIX, getrandom not AFAIR. I don't think we need to release the runtime lock since it doesn't do IO neither allocation.

@hannesm
Copy link
Member

hannesm commented Oct 28, 2024

Dear @edwintorok, would you mind to run some tests on the hardware you have in mind?

If the answer is yes, there's a branch named use-getrandom-into on my fork (https://github.com/hannesm/mirage-crypto).

If you check that out, and do a dune bu --release bench/speed.exe, and then execute _build/default/bench/speed.exe fortuna getrandom, and report the numbers here - we'll figure what the performance impact is.

Please note that on GNU/Linux, getrandom is used, while on *BSD (including macOS) getentropy is used (which definition is that only <= 256 bytes may be requested at once, so this may lead to multiple getentropy calls), and on windows "BCryptGenRandom" is called.

Performance on my FreeBSD 14 laptop (i7-5600U CPU @ 2.60GHz):

  • [fortuna]
    16: 69.354873 MB/s (8288815 iters in 1.824 s)
    64: 236.048295 MB/s (7451786 iters in 1.927 s)
    256: 776.474054 MB/s (6519627 iters in 2.050 s)
    1024: 1599.364356 MB/s (2809837 iters in 1.716 s)
    8192: 1635.329988 MB/s (410209 iters in 1.960 s)

  • [getrandom]
    16: 19.557593 MB/s (2578226 iters in 2.012 s)
    64: 79.210606 MB/s (2593071 iters in 1.998 s)
    256: 185.337362 MB/s (1547749 iters in 2.039 s)
    1024: 186.917322 MB/s (382323 iters in 1.997 s)
    8192: 188.341105 MB/s (47396 iters in 1.966 s)

Note the cap on the 256 bytes in respect to performance.

Now, a separate question is whether this performance difference (I expect it to behave differently on GNU/Linux) is acceptable or not?

@edwintorok
Copy link
Contributor Author

edwintorok commented Oct 28, 2024

would you mind to run some tests on the hardware you have in mind?

On AMD Ryzen 9 7950X (I can test on Intel hardware tomorrow):

dune exec --profile=release bench/speed.exe urandom-channel fortuna getrandom                
accel: XOR AES GHASH                 

* [urandom-channel]
       16:  199.185918 MB/s  (26128387 iters in 2.002 s)
       64:  363.951984 MB/s  (11936229 iters in 2.002 s)
      256:  458.584879 MB/s  (3752563 iters in 1.998 s)
     1024:  488.361338 MB/s  (1000044 iters in 2.000 s)
     8192:  498.784842 MB/s  (127669 iters in 2.000 s)

* [fortuna]
       16:  151.816768 MB/s  (19898022 iters in 2.000 s)
       64:  524.286053 MB/s  (17251239 iters in 2.008 s)
      256:  1930.869136 MB/s  (15773517 iters in 1.994 s)
     1024:  3928.095011 MB/s  (8038520 iters in 1.998 s)
     8192:  2788.066799 MB/s  (696526 iters in 1.952 s)

* [getrandom]
       16:  51.922043 MB/s  (6795971 iters in 1.997 s)
       64:  155.359142 MB/s  (5063598 iters in 1.989 s)
      256:  323.873028 MB/s  (2654054 iters in 2.001 s)
     1024:  446.317914 MB/s  (912571 iters in 1.997 s)
     8192:  501.244080 MB/s  (128307 iters in 2.000 s)

I've added another method for comparison by reading from /dev/urandom using a mutex protected input channel,
and the results are quite interesting: reading from the channel is faster than the syscall, probably because the buffering from the channel hides the overhead of the syscall.

diff --git a/bench/dune b/bench/dune
index dec1e4f..0a071e1 100644
--- a/bench/dune
+++ b/bench/dune
@@ -2,7 +2,7 @@
  (names speed)
  (modules speed)
  (libraries mirage-crypto mirage-crypto-rng mirage-crypto-rng.unix
-   mirage-crypto-pk mirage-crypto-ec))
+   mirage-crypto-pk mirage-crypto-ec threads.posix))
 
 ; marking as "(optional)" leads to OCaml-CI failures
 ; marking with "(package mirage-crypto-rng-miou-unix)" only has an effect with a "public_name"
diff --git a/bench/speed.ml b/bench/speed.ml
index c9db764..672715f 100644
--- a/bench/speed.ml
+++ b/bench/speed.ml
@@ -491,6 +491,15 @@ let benchmarks = [
     throughput name (fun buf ->
         let buf = Bytes.unsafe_of_string buf in
         Mirage_crypto_rng_unix.getrandom_into buf ~off:0 ~len:(Bytes.length buf))) ;
+  bm "urandom-channel" (fun name ->
+    In_channel.with_open_bin "/dev/urandom" @@ fun ic ->
+    let m = Mutex.create () in
+    let finally () = Mutex.unlock m in
+    throughput name (fun buf ->
+        let buf = Bytes.unsafe_of_string buf in
+        Mutex.lock m;
+        Fun.protect ~finally (fun () -> really_input ic buf 0 (Bytes.length buf))));
 ]
 
 let help () =

@hannesm
Copy link
Member

hannesm commented Oct 29, 2024

Thanks @edwintorok. I added the urandom-channel to the branch. I also added a pfortuna, which is the miou-fortuna that is safe to use concurrently.

Results (again, FreeBSD, on my laptop):

* [fortuna]
       16:  60.722660 MB/s  (8987269 iters in 2.258 s)
       64:  217.621068 MB/s  (7461294 iters in 2.093 s)
      256:  600.704463 MB/s  (6305806 iters in 2.563 s)
     1024:  1634.418396 MB/s  (3198050 iters in 1.911 s)
     8192:  1581.369707 MB/s  (405028 iters in 2.001 s)

* [getrandom]
       16:  19.514555 MB/s  (2564714 iters in 2.005 s)
       64:  78.424413 MB/s  (2583471 iters in 2.011 s)
      256:  187.963579 MB/s  (1539415 iters in 2.000 s)
     1024:  189.897093 MB/s  (388427 iters in 1.998 s)
     8192:  190.822405 MB/s  (48545 iters in 1.987 s)

* [urandom-channel]
       16:  103.301462 MB/s  (13545115 iters in 2.001 s)
       64:  217.365394 MB/s  (6749870 iters in 1.895 s)
      256:  300.808246 MB/s  (2470654 iters in 2.005 s)
     1024:  332.137952 MB/s  (668748 iters in 1.966 s)
     8192:  345.330283 MB/s  (88116 iters in 1.993 s)

* [pfortuna]
       16:  27.817679 MB/s  (3706092 iters in 2.033 s)
       64:  107.858512 MB/s  (3467580 iters in 1.962 s)
      256:  333.037342 MB/s  (2708027 iters in 1.985 s)
     1024:  442.379866 MB/s  (933244 iters in 2.060 s)
     8192:  596.983811 MB/s  (154561 iters in 2.023 s)

What is there to conclude? Use /dev/urandom? Or we should work hard to have sufficient locking in fortuna to avoid races.

@edwintorok
Copy link
Contributor Author

edwintorok commented Oct 29, 2024

What is there to conclude? Use /dev/urandom? Or we should work hard to have sufficient locking in fortuna to avoid races.

/dev/urandom needs locking too, but is relatively simple to do as shown in the benchmark.
There are some corner cases to take into account (see xapi-project/xen-api@2c99f2d#diff-7c050f569ba05e809d81ff8d8f193bc1b698908a75684a9064c863c462a3337e):

  • open /dev/urandom on first use instead of as a module initializer, so that we don't immediately fail in situations where /dev/urandom is unavailable (e.g. a program may link mirage-crypto, or a uuid library but doesn't necessarily need to actually generate them). getrandom doesn't have this problem, because it works even if the device is absent (e.g. /dev/ is not yet mounted)
  • close it at_exit, but allow it to be reopened (e.g. Crowbar runs the entire test at_exit)

For my purposes that is actually fast enough (as shown above), and can be implemented without depending on Mirage_crypto (although other parts of our application indirectly depends on mirage-crypto I think), but might be nice to have a Mirage_crypto generator that just wraps a /dev/urandom channel. OTOH all those workarounds make the code more complicated, getrandom and getentropy are much simpler to maintain (and already present).
getentropy is indeed part of POSIX 2024 now (https://pubs.opengroup.org/onlinepubs/9799919799/functions/getentropy.html, but please use GETENTROPY_MAX instead of 256).

I don't think that Fortuna needs to be made thread safe, but please add a warning to its docs that it isn't and its use should be avoided outside of unikernels (correctness, and security i.e. not generating the same random number twice is more important than performance IMHO, unless you're very sure your application will only ever have 1 thread or 1 domain, which you can only guarantee with unikernels. Even with Lwt you could have regular threads if you use Lwt_preemptive.detach).

To make this more robust Fortuna could be moved to a mirage specific opam package, then applications can't accidentally link it (although this would be a breaking change).

@hannesm
Copy link
Member

hannesm commented Nov 19, 2024

Thanks for your comment. I'm still undecided:

  • opening a file descriptor is likely fine, we need to ensure to have CLOEXEC flag set. But the uncertainty is the at_exit handler (i.e. crowbar) -- also, indeed there are systems (FreeBSD jails) where usually I don't mount /dev...
  • getentropy would be ideal, but the performance is rather poor

One of these two should be "the default generator for mirage-crypto-rng-async/mirage-crypto-rng-unix/mirage-crypto-rng-eio/mirage-crypto-rng-lwt". We leave fortuna for mirage-crypto-rng-mirage (and of course, others can explicitly request it). Also, mirage-crypto-rng-miou stays as is with its Pfortuna.

Now, (a) still unclear whether /dev/urandom or getentropy should be the default and (b) indeed it is a breaking API change... but neither getentropy nor /dev/urandom requires cooperation from the scheduler, so we can just put all of it into mirage-crypto-rng (.unix), and deprecate the old initialize code paths -- so API clients can simply drop the mirage-crypto-rng-{lwt,eio,async} dependency and adapt their code!? WDYT?

We can provide both devurandom and getentropy based RNGs, and the clients can pick/set their default. Still the question is which to have as default (since 99.9999% of clients won't change the default).

@edwintorok
Copy link
Contributor Author

  • but the performance is rather poor

If performance is the only concern, then a fallback mechanism could be used.
Try urandom, and if not available, then fall back to getentropy. If both fail then give up with an exception.
(I'd avoid falling back further to weaker mechanisms like time based or cpu instruction timing based)

This way the default would be thread-safe, and if you've got urandom then also reasonably fast.

And perhaps mark Fortuna as deprecated with a warning/alert that it is not thread safe.

We can provide both devurandom and getentropy based RNGs, and the clients can pick/set their default. Still the question is which to have as default (since 99.9999% of clients won't change the default).

Although looking at how mirage-crypto-rng ended up being used in our own application removing Fortuna would be a breaking change:

let () = Mirage_crypto_rng_unix.initialize (module Mirage_crypto_rng.Fortuna)

Also because currently there isn't a default generator, then even if mirage-crypto-rng is made safe by default, applications would still need to be changed to be thread-safe, but thats fine if a deprecation mechanism is used that warns when a thread-unsafe generator is used with unix.

Also, mirage-crypto-rng-miou stays as is with its Pfortuna.

Yes

hannesm added a commit to hannesm/mirage-crypto that referenced this issue Nov 27, 2024
Provide guidance to use these by default, document that Fortuna is not
thread-safe. As suggested in mirage#249
@hannesm hannesm linked a pull request Nov 27, 2024 that will close this issue
@hannesm
Copy link
Member

hannesm commented Nov 27, 2024

Dear @edwintorok, thanks again for your discussion here. I opened #250 with your suggestions. I'd be happy if you could review that PR and tell whether that is fine?

The Mirage_crypto_rng_unix.use_default () should be advertised, since it works on Unix systems (either using /dev/urandom or getentropy/getrandom/BCryptGenRandom).

hannesm added a commit to hannesm/mirage-crypto that referenced this issue Nov 27, 2024
Provide guidance to use these by default, document that Fortuna is not
thread-safe. As suggested in mirage#249
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants