Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreading support on F-Stack #834

Open
RaduNichita opened this issue Aug 13, 2024 · 16 comments
Open

Multithreading support on F-Stack #834

RaduNichita opened this issue Aug 13, 2024 · 16 comments

Comments

@RaduNichita
Copy link

RaduNichita commented Aug 13, 2024

Hey,
I recently came across the F-Stack project and I am interested in finding out more about the multithreading support for it.

Is this something that you are considering to support in the future if somebody comes with a PR for it or is there any limitation in the current codebase that prevents F-Stack API from running on multiple threads?

@AcTarjan
Copy link

I have a same requirement for multithreading support

@zhaozihanzzh
Copy link
Contributor

I think the limitation may come from the way of porting FreeBSD network stack. Many FreeBSD "syscalls" are called with the same struct thread *td , and many return values are put into the same position. For example, in file lib/ff_syscall_wrapper.c, there are many rc = curthread->td_retval[0].

@RaduNichita
Copy link
Author

Do you think it would be possible to create a new td structure when a new thread is created?

In my PR #835, where I followed the comment from #430 for a new thread is created, I copy the pointer to the parent structure.

@freak82
Copy link

freak82 commented Aug 15, 2024

IMO, there are 2 (or 3 depending on the use case) main advantages of F-stack versus standard Linux:

  1. You don't have kernel/user-space context switches and copying for the IO operations. Currently this can be mitigated in Linux using io_uring, though.
  2. You don't have shared data structures for the network traffic processing. The Linux uses single network stack to handle the traffic coming from all queues where each queue is usually handled on a separate CPU core. This single network stack, in my experience, becomes a bottleneck, due to the used shared data structures, in scenarios with high amount of traffic and/or packets.
  3. In scenarios, with lots of traffic and/or packets, it's in general faster to poll the NIC for packets than to work with interrupts. However, lots of NIC drivers implement some version of polling in addition to the interrupt handling.

So, if you want to use the F-stack from multiple threads this will remove the second advantage from the above list.
The other two advantages can be mitigated to some extent in a standard Linux application, IMO.

I'm writing this because we use the FreeBSD stack from multiple threads but not in this way. We use separate instance of the stack for each thread i.e. each thread uses it's own network stack and shares (almost) nothing with the other threads (There are some lock-free queues for communication between the threads). However, in our version the DPDK layer is decoupled as a separate module and not glued to the FreeBSD stack, this allows the application to have the DPDK layer with N worker threads and each worker thread having each own instance of the FreeBSD stack.
It's just an alternative design which allows (almost) linear scaling of the application with the number of CPU cores.

@RaduNichita
Copy link
Author

Thanks for the answer @freak82. Do you use F-Stack in your project? If so, did you make the change you mentioned to have a FreeBSD Stack per each thread?

@freak82
Copy link

freak82 commented Aug 19, 2024

Yes, we use the modified F-stack version in production currently but not for a server. We use it for transparent caching proxy.
As I said the version is modified so that we instantiate a separate network stack (i.e. separate FreeBSD/F-stack network stack) per thread and there is no sharing between the threads aside from some lock-free queues used for message passing between threads. The key part is the separation between the DPDK and the FreeBSD stack because the DPDK must/need to be single instance for the whole application while the FreeBSD stack is used as a library instantiated in each thread separately.

@RaduNichita
Copy link
Author

@freak82, do you think you can post the patch that you applied to the F-Stack to have one FreeBSD stack per each thread, please? I think it would be a great contribution to the community

@freak82
Copy link

freak82 commented Aug 19, 2024

I need to ask my employer first.
However, keep in mind that:

  • I may be able to give you only the patches for the F-stack/FreeBSD stack as the DPDK layer is now integrated in the proxy application itself and it's used for other application specific things there.
  • Some of the changes we did to the F-stack/FreeBSD stack are specific to our usage scenario where we work as a proxy and not as a server

@RaduNichita
Copy link
Author

Hey @freak82 .

Do you have any updates regarding posting the patches for the F-Stack / FreeBSD stack?

@freak82
Copy link

freak82 commented Aug 23, 2024

I got green light about this from my employer.
Here are the patches that we applied to the FreeBSD stack.
f-stack-patches.tar.gz

Few notes:

  1. In general I highly doubt that the patches will be useful to anybody because they tailor the F-stack to our use case and in addition they turn the F-stack into library which expects relatively high amount glue code to be given by the application.
  2. There are functions added to the F-stack which are again specific for our use case.
  3. The library is built like a .so and then every thread in given application is suppose to dlopen it's own copy of the .so.
  4. The DPDK functionality is completely removed from the F-stack and moved as a layer to the application which uses the .so objects.
  5. You are supposed first to initialize each stack via ff_init_netstack
  6. Then you are supposed to initialize the interface via ff_veth_if_init.
  7. During runtime you are supposed to call ff_on_timer_tick with the declared frequency
  8. During runtime you can inject packets from your application to the stack via ff_veth_process_packets
  9. Every packet is supposed to be "allocated" via ff_mbuf_gethdr
  10. If there are segments they should be allocated and linked via ff_mbuf_get

@RaduNichita
Copy link
Author

Hey @freak82,

Firstly, many thanks for publishing the patch about multithreading on F-Stack. I really appreciate this! I took some time to go through it to understand the changes.

Secondly, I tried applying the first patch (git apply x-stack.patch), but I got the following error (I've tried on the dev branch, v.1.21 and v1.23 release versions):

x-stack.patch:69: trailing whitespace.
    DEFAULT(pu->pru_setup_transparent_sockets, 
x-stack.patch:127: trailing whitespace.
static int x3me_socket(struct thread* td, 
x-stack.patch:128: trailing whitespace.
                       int domain, 
x-stack.patch:129: trailing whitespace.
                       int type, 
x-stack.patch:168: trailing whitespace.
    /* 
warning: build.sh has type 100644, expected 100755
error: patch failed: build.sh:5
error: build.sh: patch does not apply
error: patch failed: lib/ff_types.h:209
error: lib/ff_types.h: patch does not apply

I also tried running git apply --reject x-stack.patch and tried compiling the F-Stack library, but got the following error:

make: *** No rule to make target 'ff_types.c', needed by 'ff_types.o'.  Stop.

It seems that the ff_types.c file is missing from the series of three patches.

Do you think it would be possible to add it, please?

Furthermore, if I understood correctly, the DPDK is initialized as a separated entity, not in the ff_init_netstack function. Could you add some small example which shows how is binding between F-Stack and DPDK done, please?

Many thanks again! 🙏

@freak82
Copy link

freak82 commented Aug 27, 2024

Will not have time for the DPDK exmples these days. Too much work, sorry.
Here are the missing ff_type files. Don't have time now to recreate the patch correctly. Sorry, again.
ff_types.tar.gz

You may ping me again at the end of this week - like Friday for the example code.

@RaduNichita
Copy link
Author

Hey @freak82,

Do you think you can add the example code to show how the binding between F-Stack and DPDK can be done, please?

@freak82
Copy link

freak82 commented Sep 3, 2024

At the end of the week probably and I'm not sure I'll have time even then but I'll try.

@freak82
Copy link

freak82 commented Sep 6, 2024

source-code.tar.gz
Here is some source code from our application. Few notes about it:

  1. It's not a working example - I just got the excerpts of the code from our application which operate with the F-stack library. The logic is intertwined with too many other things and I can't not give you working examples without sitting for a day or two and just doing that. I don't have this time currently, sorry.
  2. The lib.h and lib.cpp contain the logic for loading the library on given thread and the wrappers for the dlopened-functions.
    Note that you need to call init_on_this_thread on every worker thread with a separate copy of the f-stack library. You need to do this first before doing anything else with the wrapped functions.
  3. The pkt_processor.cpp contains excerpts of the functionality which calls the wrapped f-stack functions. There are few initialization routines init_.... There is the logic from the main application loop with the receiving and the send of the packets.
    The initialization routines use some custom allocations but you'll need to figure out the logic by yourself. The custom allocations are used in our application just because every heap allocation is done via custom memory allocators which work with huge pages underneath. You may throw all that and use the DPDP rte_malloc and the similar functions. Or for the initial tests you can use the glibc malloc and friends.

@wangchong2023
Copy link

source-code.tar.gz Here is some source code from our application. Few notes about it:

  1. It's not a working example - I just got the excerpts of the code from our application which operate with the F-stack library. The logic is intertwined with too many other things and I can't not give you working examples without sitting for a day or two and just doing that. I don't have this time currently, sorry.
  2. The lib.h and lib.cpp contain the logic for loading the library on given thread and the wrappers for the dlopened-functions.
    Note that you need to call init_on_this_thread on every worker thread with a separate copy of the f-stack library. You need to do this first before doing anything else with the wrapped functions.
  3. The pkt_processor.cpp contains excerpts of the functionality which calls the wrapped f-stack functions. There are few initialization routines init_.... There is the logic from the main application loop with the receiving and the send of the packets.
    The initialization routines use some custom allocations but you'll need to figure out the logic by yourself. The custom allocations are used in our application just because every heap allocation is done via custom memory allocators which work with huge pages underneath. You may throw all that and use the DPDP rte_malloc and the similar functions. Or for the initial tests you can use the glibc malloc and friends.

@freak82 @RaduNichita
Have you studied libuinet? It supports uinet_instance_t and libev, but I wonder if it supports multi-threading model?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants