Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: Concurrency manager #1534

Open
joshring opened this issue Oct 9, 2024 · 3 comments
Open

Idea: Concurrency manager #1534

joshring opened this issue Oct 9, 2024 · 3 comments

Comments

@joshring
Copy link
Contributor

joshring commented Oct 9, 2024

Inspired by the article: https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/

But if you have threads (green- or OS-level), you don’t need to do that. You can just suspend the entire thread and hop straight back to the OS or event loop without having to return from all of those functions.

Go is the language that does this most beautifully in my opinion. As soon as you do any IO operation, it just parks that goroutine and resumes any other ones that aren’t blocked on IO.

Inspired by scoped allocators:

DynamicArenaAllocator dynamic_arena;
dynamic_arena.init(1024);
mem::@scoped(&dynamic_arena)
{
    // This allocation uses the dynamic arena
    Foo* f = malloc(Foo);
};
// Release any dynamic arena memory.
dynamic_arena.destroy();

An opt-in scope with a job scheduler which can spin up and pause a "thread-like" on IO or other interrupts (?) would be really interesting

  • Similar to an allocator like the DynamicArenaAllocator, the job scheduler behaviour could be configurable to suit an application**
  • only exists in a scope
  • eg how aggressively it paused jobs would determine latency, but reduce throughput
    how many "thread-like"s to make

Some psuedocode of network based concurrency

concurrency::AsyncIO asyncio;
concurrency::SchedulerWorkStealing work_stealing;
asyncio.init(thread_type: os::GreenThread, timeout: 500, clock_cycle_time: 22, scheduler: work_stealing, num_threads: 4 * NUM_CPU);
concurrency::@scoped(&asyncio)
{
    // Network requests here
};
asyncio.destroy();

Some psuedocode of long running jobs based concurrency

But also support things which are of a very different nature, eg high performance computing job scheduling

concurrency::JobQueue job_queue;
concurrency::SchedulerFIFO fifo_scheduler;
job_queue.init(thread_type: os::Thread, timeout: 100_000, scheduler: fifo_scheduler, num_threads: 10_000);
concurrency::@scoped(&job_queue)
{
    // HPC compute tasks here
};
job_queue.destroy();

variables **

latency, throughput, cross-job-synchronisation, number of jobs, job length, job scheduling

platform constraints

platform memory amount, platform memory bandwidth, platform memory access times if NUMA, platform CPU/GPU, platform network

Task constraints

  • CPU time limit
  • Memory usage limits
  • Time before timeout
  • Access to particular subsystem, eg network may be restricted completely
  • Access to particular subsystem may need to wait for other tasks to complete.
  • Cross-task communication

Cross task communication

  • Barriers force tasks to sync then exchange information, eg if need dependency to proceed.
  • Accumulate work from faster tasks into an in-memory or disk-based buffer to be processed by the slower task when ready, eg sending emails get queued into a list by an API

Task groups (just a kind of task)

  • Groups of tasks, where one task depends on another
    eg fetching details from a database to use in an API request.
    Expression blocks or functions might be a nice way to express this

Task priority

  • Task priority, some tasks are critical within a deadline, others are optional within a deadline, eg cancel the optional ones to get the critical ones done in time.
  • Cancelable and non-Cancelable tasks, some tasks should not be cancelled, while others are OK to cancel, based on a message sent to the task or decided centrally?
  • Policy on cancel, restart or do not restart, postpone for X time, exponential backoff etc
  • Interrupts

Fault tolerance - How to handle failed tasks

  • Task must have Excuse handling code
  • Common: Report Excuse, Repeat task, Fail task, Reach Consensus in Distributed systems.

Scheduler types

  • Work stealing (levelling out the workload across threads)
  • Realtime deterministic scheduler (deadline focused)
  • FIFO
  • LIFO
  • Distributed systems
  • NUMA system

So this abstraction should work through the range of use-cases from an embedded system, to a video game, webserver, NUMA system to a distributed system.

@ygorpontelo
Copy link
Contributor

Nice to see a discussion on this. My main concern about just having the usual threads is that we can have connections that can last for quite while and even worst they just hang there. I've dealt with production servers that had hundreds of websockets connections just sitting there, and other connections that were long polling. Creating a thread for each blocking connection gets very heavy.

I would also like to add another article to the discussion: https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful/.

Now i don't think the solution needs to be the async/await that we see in the other languages, but having a good way approach this problem would be fantastic. The default seems to be just callbacks, but i'm not a fan of that.

@lerno
Copy link
Collaborator

lerno commented Oct 16, 2024

I've worked with different solutions. Note that a thread that is blocking in the case of connections is only valid if you only either read or write. If you need both you end up with 1 or 2 extra threads(!) (where the last thread is to control the other two...)

In that case, I almost always just write a single thread that does a select or similar and then dispatches to event queues backed by a thread pool.

Now it's common to just use lambdas instead of proper events. Indeed, for something general that's probably the only really flexible solution in a lot of languages. However, this just spreads code EVERYWHERE, besides the problem of having to use callbacks.

The design I keep coming back to because it just works so well, is to pass events to event handlers, where the event handlers keep the implementation internal to itself, and the event message becomes the async "call". So an event is then an enum value + a payload. This allows very nice features such as:

  1. Event filtering (discard events that is only valid for an earlier state)
  2. Delegating (the event can be passed to a delegate which handles the event, the delegate can be swappable)
  3. Introspection (see what events are queued + their contents)
  4. Record and replay (storing events, then replaying particular scenarios, e.g. to reproduce bugs)
  5. Shallow call stack
  6. The actual code running on event queue XYZ is known (as opposed to queuing a closure, where the closure is merely borrow the event queue's thread.

All of these are very beneficial, even though it looks a bit primitive compared to closures. It's a much more solid approach.

However, this is for applications where there are simultaneous read/writes, for example like in a game.

@ygorpontelo
Copy link
Contributor

ygorpontelo commented Oct 17, 2024

It's still a bit unclear how that would look like to me. In a simpler version of just sending requests, how would you make it available for use?

I know you said it's often not the best solution, but having something would already help. Searching for examples of these is not easy either.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants