Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Domain: document clearly that creating more domains than recommended is a terrible idea #11921

Open
gasche opened this issue Jan 19, 2023 · 5 comments

Comments

@gasche
Copy link
Member

gasche commented Jan 19, 2023

We have succesfully communicated that domains are heavy-weight abstractions, there should be a few of them, about the same number as hardware threads running on the machine. What most users do not realize is that having just a bit more domains than hardware threads is generally a terrible idea for performance. (I only learned of this myself a couple month ago.) They may rather be thinking of "pthread with some more runtime overhead", the kind you can have 100s but not 10000s.

Currently the OCaml manual says:

Domains are heavy-weight entities. Each domain maps 1:1 to an operating system thread. Each domain also has its own runtime state, which includes domain-local structures for allocating memory. Hence, they are relatively expensive to create and tear down.

It is recommended that the programs do not spawn more domains than cores available.

and domain.mli says:

val recommended_domain_count : unit -> int
(** The recommended maximum number of domains which should be running
    simultaneously (including domains already running).
    The value returned is at least [1]. *)

What we probably want to say is more like "never ever spawn more domains than recommended; always always use a domain pool". Otherwise people just get confused and write bad code, and the documentation is partly to blame.

Related issues:

@dra27
Copy link
Member

dra27 commented Jan 20, 2023

As part of that, it's probably worth mentioning that it's because of the GC that you never want to do this (i.e. we strengthen the advice and also explain why it's different from the user's preconceptions about how OS threads normally work)

@yawaramin
Copy link
Contributor

it's because of the GC that you never want to do this (i.e. we strengthen the advice and also explain why it's different from the user's preconceptions about how OS threads normally work)

If we never want to do it then shouldn't we just make it the max_domain_count and raise an exception if more domains are created?

@gasche
Copy link
Member Author

gasche commented Feb 5, 2023

No, for example because we don't fully trust recommended_domain_count to actually return a truthful/correct/accurate value on all systems.

Remember that Domain is intended to be a first building block for parallel code, not the user-facing abstraction. We should document its assumptions carefully, but the module is not the right place to reduce flexibility in exchange for convenience.

@dclements
Copy link

dclements commented Oct 4, 2023

I stumbled into this while investigating a performance problem, so my apologies for resurrecting this, but some relevant notes from a person who is new to OCaml:

  1. That domains are a heavyweight abstraction is found in the manual, but it is nowhere in the Domain library doc. If you just read the library doc—or something like the Eio docs, which talk about how to create a pool—you would walk away with a very different impression of how these play with the ecosystem than you get from reading the manual.
  2. I would argue that the language in the manual indicates that it is expensive, but not how expensive. It is "recommended you don't spawn more" than the number of cores, and it is stated by @gasche that it is "not intended to be a user-facing abstraction" but even the examples in the docs (c.f., section 2.1 of the guide) treat it as a user-facing component by taking the number of domains from the command line (and the example is very close to a thread model, which has been indicated here to be an incorrect way to think of them).
  3. Possibly beyond the scope of this, tying into what @polytypic said above: there needs to be an example of how to actually manage this in practice. Because going above recommended_domains is a terrible idea and you should use a pool (not as clearly documented as I'd prefer, as mentioned here), but there are no patterns I can find for what happens when you have multiple parts of your system that all want domain pools (e.g., I have a processing component that wants a domain pool and cohttp-eio wants a domain pool, but they can't actually be the same domain pool, so either I can have a bunch of idle domains or I can make potentially serious performance compromises). If there is a recommended best practice—especially one that doesn't eat performance in other ways—here I cannot find it.

@gasche
Copy link
Member Author

gasche commented Oct 4, 2023

@dclements I agree. Would you like to submit PRs to improve the documentation in various places?

There is no recommended pooling library because early 5.x releases were designed as "early adopter" releases, with an understanding that the concurrency library ecosystem is incomplete. The Domains module is a low-level abstraction, and we are waiting for userland to provide better abstractions -- faster, hopefully, than we users observe our manual/documentation updates (every six months on new releases).

To my knowledge the current pooling implementations are:

but I fully expect most large programs adopting 5.0 to come up with their own, bespoke solutions to this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants