Domain: document clearly that creating more domains than recommended is a terrible idea #11921

gasche · 2023-01-19T20:41:40Z

We have succesfully communicated that domains are heavy-weight abstractions, there should be a few of them, about the same number as hardware threads running on the machine. What most users do not realize is that having just a bit more domains than hardware threads is generally a terrible idea for performance. (I only learned of this myself a couple month ago.) They may rather be thinking of "pthread with some more runtime overhead", the kind you can have 100s but not 10000s.

Currently the OCaml manual says:

Domains are heavy-weight entities. Each domain maps 1:1 to an operating system thread. Each domain also has its own runtime state, which includes domain-local structures for allocating memory. Hence, they are relatively expensive to create and tear down.

It is recommended that the programs do not spawn more domains than cores available.

and domain.mli says:

val recommended_domain_count : unit -> int
(** The recommended maximum number of domains which should be running
    simultaneously (including domains already running).
    The value returned is at least [1]. *)

What we probably want to say is more like "never ever spawn more domains than recommended; always always use a domain pool". Otherwise people just get confused and write bad code, and the documentation is partly to blame.

Related issues:

Clarify docs for Domain.recommended_domain_count #11649 (comment), Clarify docs for Domain.recommended_domain_count #11649 (comment)
Do new pools "own" the current domain? ocaml-multicore/domainslib#77 (comment)
multicore: massive slowdown on spectralnorm when domains > cores (slower than single domain) #11818. (This has an example program where running on 24 domains using a 24-cores machine is 9x faster than running on a single domain, but running on 25 domains is 1.5x slower than running on a single domain.)

The text was updated successfully, but these errors were encountered:

dra27 · 2023-01-20T09:12:56Z

As part of that, it's probably worth mentioning that it's because of the GC that you never want to do this (i.e. we strengthen the advice and also explain why it's different from the user's preconceptions about how OS threads normally work)

yawaramin · 2023-02-05T20:08:22Z

it's because of the GC that you never want to do this (i.e. we strengthen the advice and also explain why it's different from the user's preconceptions about how OS threads normally work)

If we never want to do it then shouldn't we just make it the max_domain_count and raise an exception if more domains are created?

gasche · 2023-02-05T21:16:48Z

No, for example because we don't fully trust recommended_domain_count to actually return a truthful/correct/accurate value on all systems.

Remember that Domain is intended to be a first building block for parallel code, not the user-facing abstraction. We should document its assumptions carefully, but the module is not the right place to reduce flexibility in exchange for convenience.

dclements · 2023-10-04T16:08:42Z

I stumbled into this while investigating a performance problem, so my apologies for resurrecting this, but some relevant notes from a person who is new to OCaml:

That domains are a heavyweight abstraction is found in the manual, but it is nowhere in the Domain library doc. If you just read the library doc—or something like the Eio docs, which talk about how to create a pool—you would walk away with a very different impression of how these play with the ecosystem than you get from reading the manual.
I would argue that the language in the manual indicates that it is expensive, but not how expensive. It is "recommended you don't spawn more" than the number of cores, and it is stated by @gasche that it is "not intended to be a user-facing abstraction" but even the examples in the docs (c.f., section 2.1 of the guide) treat it as a user-facing component by taking the number of domains from the command line (and the example is very close to a thread model, which has been indicated here to be an incorrect way to think of them).
Possibly beyond the scope of this, tying into what @polytypic said above: there needs to be an example of how to actually manage this in practice. Because going above recommended_domains is a terrible idea and you should use a pool (not as clearly documented as I'd prefer, as mentioned here), but there are no patterns I can find for what happens when you have multiple parts of your system that all want domain pools (e.g., I have a processing component that wants a domain pool and cohttp-eio wants a domain pool, but they can't actually be the same domain pool, so either I can have a bunch of idle domains or I can make potentially serious performance compromises). If there is a recommended best practice—especially one that doesn't eat performance in other ways—here I cannot find it.

gasche · 2023-10-04T18:56:02Z

@dclements I agree. Would you like to submit PRs to improve the documentation in various places?

There is no recommended pooling library because early 5.x releases were designed as "early adopter" releases, with an understanding that the concurrency library ecosystem is incomplete. The Domains module is a low-level abstraction, and we are waiting for userland to provide better abstractions -- faster, hopefully, than we users observe our manual/documentation updates (every six months on new releases).

To my knowledge the current pooling implementations are:

idle-domains by @polytypic, mentioned above
moonpool by c-cube

but I fully expect most large programs adopting 5.0 to come up with their own, bespoke solutions to this problem.

gasche added the documentation label Jan 19, 2023

gasche mentioned this issue Aug 3, 2024

Idle domains slow down minor GC #13358

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Domain: document clearly that creating more domains than recommended is a terrible idea #11921

Domain: document clearly that creating more domains than recommended is a terrible idea #11921

gasche commented Jan 19, 2023 •

edited

Loading

dra27 commented Jan 20, 2023

yawaramin commented Feb 5, 2023

gasche commented Feb 5, 2023 •

edited

Loading

dclements commented Oct 4, 2023 •

edited

Loading

gasche commented Oct 4, 2023

Domain: document clearly that creating more domains than recommended is a terrible idea #11921

Domain: document clearly that creating more domains than recommended is a terrible idea #11921

Comments

gasche commented Jan 19, 2023 • edited Loading

dra27 commented Jan 20, 2023

yawaramin commented Feb 5, 2023

gasche commented Feb 5, 2023 • edited Loading

dclements commented Oct 4, 2023 • edited Loading

gasche commented Oct 4, 2023

gasche commented Jan 19, 2023 •

edited

Loading

gasche commented Feb 5, 2023 •

edited

Loading

dclements commented Oct 4, 2023 •

edited

Loading