Allow using an app-provided thread #1923

DemiMarie · 2021-08-18T02:50:16Z

Describe the feature you'd like supported

It would be nice if MsQuic allowed apps to provide their own threads, and perform event polling themselves.

Proposed solution

See above.

Additional context

In some environments, such as Lua and Node.js, all callbacks must eventually be run on a single thread. This currently requires marshaling them back to the main thread, which is less efficient than if MsQuic could integrate into the built-in event loop. Other environments, such as Rust with Tokio, already provide their own high-performance event loops, and having to use a separate thread for QUIC would require additional locking.

nibanks · 2021-08-18T11:39:44Z

We've discussed possibly supporting this, but never came to and hard conclusion. How would you actually use this, if we added support? There is significant work involved and we wouldn't want to do this unless something would definitely use it.

DemiMarie · 2021-08-18T17:09:01Z

We've discussed possibly supporting this, but never came to and hard conclusion. How would you actually use this, if we added support? There is significant work involved and we wouldn't want to do this unless something would definitely use it.

I don’t have any particular plans myself, and am not in a position where I am likely to use MsQuic in the near future. That said, I would not be surprised if some people have ruled out MsQuic as an option because of this without filing an issue.

nibanks · 2021-08-19T12:06:30Z

That said, I would not be surprised if some people have ruled out MsQuic as an option because of this without filing an issue.

Perhaps. There is a reason we went with this model of owning the threads in MsQuic though. There is a lot of complexity involved in implementing a performant parallelized networking layer, and by owning the threads in MsQuic we can do all the hard work internally and the apps get it for free. If we add support for this Issue, I do expect apps that use this model to have a significant performance decrease from those that do not.

DemiMarie · 2021-08-19T13:54:12Z

That said, I would not be surprised if some people have ruled out MsQuic as an option because of this without filing an issue.

Perhaps. There is a reason we went with this model of owning the threads in MsQuic though. There is a lot of complexity involved in implementing a performant parallelized networking layer, and by owning the threads in MsQuic we can do all the hard work internally and the apps get it for free. If we add support for this Issue, I do expect apps that use this model to have a significant performance decrease from those that do not.

Is the current model compatible with Node.js, for example, or would that require marshalling?

nibanks · 2021-08-19T13:57:33Z

Is the current model compatible with Node.js, for example, or would that require marshalling?

I have no experience with Node.js so I cannot answer that.

thhous-msft · 2021-08-19T14:17:09Z

I would actually be slightly scared to put the QUIC workers on the UI thread in Node.js. Unlike the current TCP and UDP implementations, which do a TINY amount of work at the user mode level, QUIC is very computationally expensive, including encryption, synchronous DNS lookup, and all the timing requirements for the protocol. I highly suspect running QUIC directly on the UI thread would start to cause the UI to lag. And because you'd be limited to a single thread, you'd lose a lot of perf there as well.

Very little in Node currently is computationally expensive, and anything that is usually is marshalled to a separate thread in some way.

DemiMarie · 2021-08-19T15:02:52Z

I would actually be slightly scared to put the QUIC workers on the UI thread in Node.js. Unlike the current TCP and UDP implementations, which do a TINY amount of work at the user mode level, QUIC is very computationally expensive, including encryption, synchronous DNS lookup, and all the timing requirements for the protocol. I highly suspect running QUIC directly on the UI thread would start to cause the UI to lag. And because you'd be limited to a single thread, you'd lose a lot of perf there as well.

How does that compare to the current TLS client and server? Also, my understanding is that Node.js usually scales by having multiple instances of the server running, or by using multiple contexts. So using 2x the CPU for less than 2x performance is not guaranteed to be a win.

nibanks · 2021-08-19T15:26:54Z

How does that compare to the current TLS client and server?

Again, I don't know how that's currently done in Node, but TLS is very expensive, so I'd assume it's never done on a blocking thread.

Also, my understanding is that Node.js usually scales by having multiple instances of the server running, or by using multiple contexts. So using 2x the CPU for less than 2x performance is not guaranteed to be a win.

MsQuic scales thread with processor count. Additionally, RSS (receive side scaling) uses dedicated threads per processor to match the NIC's processor receive indications, so it does scale very well; especially on multi-NUMA node machines.

DemiMarie · 2021-12-18T11:23:08Z

That said, I would not be surprised if some people have ruled out MsQuic as an option because of this without filing an issue.

Perhaps. There is a reason we went with this model of owning the threads in MsQuic though. There is a lot of complexity involved in implementing a performant parallelized networking layer, and by owning the threads in MsQuic we can do all the hard work internally and the apps get it for free. If we add support for this Issue, I do expect apps that use this model to have a significant performance decrease from those that do not.

At a minimum, I would like to be able to integrate my own code into MsQuic’s event loop somehow. I might need to handle HTTP/1.1 and HTTP/2 traffic as well, for instance.

nibanks · 2021-12-18T15:52:49Z

At a minimum, I would like to be able to integrate my own code into MsQuic’s event loop somehow. I might need to handle HTTP/1.1 and HTTP/2 traffic as well, for instance.

@DemiMarie we're doing work on refactoring how scheduling works, and would be happy to take inputs and suggestions. We refactored the QUIC worker thread so that it can be run by another thread:

//
// General purpose execution context abstraction layer. Used for driving worker
// loops.
//

typedef struct CXPLAT_EXECUTION_CONTEXT CXPLAT_EXECUTION_CONTEXT;

//
// Returns FALSE when it's time to cleanup.
//
typedef
_IRQL_requires_max_(PASSIVE_LEVEL)
BOOLEAN
(*CXPLAT_EXECUTION_FN)(
    _Inout_ CXPLAT_EXECUTION_CONTEXT* Context,
    _Inout_ uint64_t* TimeNowUs,    // The current time, in microseconds.
    _In_ CXPLAT_THREAD_ID ThreadID  // The current thread ID.
    );

typedef struct CXPLAT_EXECUTION_CONTEXT {

    void* Context;
    CXPLAT_EXECUTION_FN Callback;
    uint64_t NextTimeUs;
    BOOLEAN Ready;

} CXPLAT_EXECUTION_CONTEXT;

And usage:

// TODO - Add synchronization around this stuff.
uint32_t ExecutionContextCount = 0;
CXPLAT_EXECUTION_CONTEXT* ExecutionContexts[8];

void CxPlatAddExecutionContext(CXPLAT_EXECUTION_CONTEXT* Context)
{
    CXPLAT_FRE_ASSERT(ExecutionContextCount < ARRAYSIZE(ExecutionContexts));
    ExecutionContexts[ExecutionContextCount] = Context;
    ExecutionContextCount++;
}

BOOLEAN CxPlatRunExecutionContexts(_In_ CXPLAT_THREAD_ID ThreadID)
{
    if (ExecutionContextCount == 0) {
        return FALSE;
    }

    uint64_t TimeNow = CxPlatTimeUs64();
    for (uint32_t i = 0; i < ExecutionContextCount; i++) {
        CXPLAT_EXECUTION_CONTEXT* Context = ExecutionContexts[i];
        if (Context->Ready || Context->NextTimeUs <= TimeNow) {
            if (!Context->Callback(Context->Context, &TimeNow, ThreadID)) {
                // Remove the context from the array.
                if (i + 1 < ExecutionContextCount) {
                    ExecutionContexts[i] = ExecutionContexts[--ExecutionContextCount];
                } else {
                    ExecutionContextCount--;
                }
            }
        }
    }

    return TRUE;
}

With this model exposed to the API, it would allow the app's thread do drive the execution contexts. The complexity comes in trying to continue to have things like RSS and CID-based routing still work effectively.

DemiMarie · 2022-03-18T09:12:08Z

@nibanks so one thought I had is to allow the MsQuic event loop to handle other things as well, such as pollable file descriptors on Unix and I/O completion ports and waitable events on Windows. The latter will require using undocumented NT kernel APIs, but I imagine it would not be too hard for you to work around that problem.

As far as RSS and CID-based routing, what are the tricky parts? Would it be possible to decouple the networking code from the state machine, as Quinn does? Would there be a performance penalty in doing so?

nibanks · 2022-03-18T12:34:50Z

@nibanks so one thought I had is to allow the MsQuic event loop to handle other things as well, such as pollable file descriptors on Unix and I/O completion ports and waitable events on Windows. The latter will require using undocumented NT kernel APIs, but I imagine it would not be too hard for you to work around that problem.

@DemiMarie yes we've thought about designs both where msquic handles everything and where we expose interfaces such that the app can handle everything. Both have complexities, mostly originating from the fact that there is no single, easy pattern that works cross-platform. Just for the datapath layer, epoll, kqueue, iocp, etc. all have slight differences that complicate things.

As far as RSS and CID-based routing, what are the tricky parts? Would it be possible to decouple the networking code from the state machine, as Quinn does? Would there be a performance penalty in doing so?

Anything is possible, but we have to balance complexity and performance. Unlike any other QUIC stack that I know, MsQuic is designed to align RSS all the way up from the NIC even into the application thread; all on the same CPU (if everything is used properly). This is very complicated and difficult to achieve, and providing for a generic interface that other threads could control will make it more difficult.

That isn't to say we don't want to go there. We want to figure out a good way to do this, but still haven't quite achieved it yet.

bwoebi · 2023-09-24T16:39:15Z

Just stumbling about this issue, might be worth adding my 2 cents:
I really like the API surface of MsQuic, it feels more complete and usable than any other QUIC implementation I've encountered.

However, I'm at a loss at how I would integrate it with PHP (via FFI). The PHP model generally requires PHP Code to be invoked only from a single thread (and then do multi-processing if needed for scaling horizontally).
Then, in addition, one generally wants to schedule timers and other I/O on the same thread.
But ultimately all these event loops are doing is "notify me when there's some event waiting for this file descriptor". The easiest way to integrate would thus being able to chose an executor model, where I can just give it my udp socket handle, and then it tells me when I should start and stop polling for readability/writability via callbacks and then I can notify the MsQuic executor about that fact.

I would like if MsQuic would not fully decouple the networking, as I definitely appreciate it trying to optimize the networking, setting socket options etc. Just the small task of socket I/O readiness I would need to be abstracted away.

nibanks · 2023-09-24T19:07:46Z

It's definitely a goal to be able to allow the app thread to drive the execution. It's still a work in progress though. Thanks for the feedback!

redbaron · 2024-10-23T13:48:32Z

What is being described/requested here is sans-io model:

By externalising all IO and timers, library becomes effectively just a state machine. One benefit it brings is ease of porting library to other platforms: it simply doesn't contain any platform specific code anymore. Currently we ruled out msquic for one of our projects because plugging IO for consoles platforms requires maintaining fork.

nibanks · 2024-10-23T14:04:40Z

As you can see from the recently linked draft PR, we're actively working on exposing a way for external control of the execution.

redbaron · 2024-10-23T14:25:39Z

I had a look, it is very early work and it is hard to see how it will shape up. It might allow better control over threading, but it looks like it retain lot of responsibility for IO in msquic, making porting it still a hassle.

Ideal sans-io interface should accept time delta since last poll and vector of (socket_handle, buffer) tuples, with library completely unaware how exactly each one of them was received all it knows that given socket handle sent us given buffer. It then returns vector of (socket_handle, buffer) of what it would like to app to send and minimal time delta it expects to be polled again to process timeouts.

Realistic payloads passed in/out from sans-io library are likely to be more complicated with enums for connection created, closed, reporting IO errors, supporting. multiple buffers per socket for scatter/gather IO, etc.

nibanks · 2024-10-23T15:29:21Z

That model assumes you have socket handles, which is not always correct in terms of XDP and DPDK.

DemiMarie · 2024-10-23T19:28:08Z

Indeed so. RSS alignment (which really helps performance) is another factor.

redbaron · 2024-10-24T08:32:16Z

I don't follow. Socket handle doesn't have to point to actual kernel socket, call it io_handle , just an identifier which IO layer outside of msquic can use to understand how to send bytes there and msquic use it to identify QUIC endpoint bytes belong to. I am not familiar with XDP or DPDK , but surely it has notion of source:port,dst:port even if it crafts raw packets including all of IP headers, io_handle can be mapped to these network tuples.

Same for RSS, because all IO is externalised , msquic gives up control on it and it is up to IO layer to chose CPU to run IO on. If required there can be msquic instance per CPU to have share nothing architecture. With msquic acting just as a state machine app has full flexibility how and when to drive it.

nibanks added Area: API Area: Core Related to the shared, core protocol logic external Proposed by non-MSFT feature request A request for new functionality labels Aug 18, 2021

nibanks added this to the Future milestone Aug 18, 2021

nibanks mentioned this issue Aug 18, 2021

Refactor Worker Loop #1924

Merged

nibanks added this to MsQuic Walkthroughs May 8, 2023

nibanks moved this to Should be written in MsQuic Walkthroughs May 8, 2023

nibanks pinned this issue Aug 18, 2023

nibanks linked a pull request Oct 16, 2024 that will close this issue

External Execution Interface #4616

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow using an app-provided thread #1923

Allow using an app-provided thread #1923

DemiMarie commented Aug 18, 2021

nibanks commented Aug 18, 2021

DemiMarie commented Aug 18, 2021

nibanks commented Aug 19, 2021

DemiMarie commented Aug 19, 2021

nibanks commented Aug 19, 2021

thhous-msft commented Aug 19, 2021

DemiMarie commented Aug 19, 2021

nibanks commented Aug 19, 2021

DemiMarie commented Dec 18, 2021

nibanks commented Dec 18, 2021

DemiMarie commented Mar 18, 2022

nibanks commented Mar 18, 2022

bwoebi commented Sep 24, 2023

nibanks commented Sep 24, 2023

redbaron commented Oct 23, 2024

nibanks commented Oct 23, 2024

redbaron commented Oct 23, 2024

nibanks commented Oct 23, 2024

DemiMarie commented Oct 23, 2024

redbaron commented Oct 24, 2024

Allow using an app-provided thread #1923

Allow using an app-provided thread #1923

Comments

DemiMarie commented Aug 18, 2021

Describe the feature you'd like supported

Proposed solution

Additional context

nibanks commented Aug 18, 2021

DemiMarie commented Aug 18, 2021

nibanks commented Aug 19, 2021

DemiMarie commented Aug 19, 2021

nibanks commented Aug 19, 2021

thhous-msft commented Aug 19, 2021

DemiMarie commented Aug 19, 2021

nibanks commented Aug 19, 2021

DemiMarie commented Dec 18, 2021

nibanks commented Dec 18, 2021

DemiMarie commented Mar 18, 2022

nibanks commented Mar 18, 2022

bwoebi commented Sep 24, 2023

nibanks commented Sep 24, 2023

redbaron commented Oct 23, 2024

nibanks commented Oct 23, 2024

redbaron commented Oct 23, 2024

nibanks commented Oct 23, 2024

DemiMarie commented Oct 23, 2024

redbaron commented Oct 24, 2024