Replies: 12 comments 2 replies
-
See #2188 (comment) |
Beta Was this translation helpful? Give feedback.
-
See #2188 (comment) |
Beta Was this translation helpful? Give feedback.
This comment has been hidden.
This comment has been hidden.
-
See #2188 (comment) |
Beta Was this translation helpful? Give feedback.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
This comment has been hidden.
-
So, does it support multi-threading for WebRTC now? After starting multiple processes and adding port reuse, only one core is used on a multi-core machine. |
Beta Was this translation helpful? Give feedback.
-
I've been working with Node.js for almost a year now, and I found that it is very similar to SRS's coroutine + multithreading, and I can basically see the future of SRS's multithreading. The simplicity is quite good, which is what we want. We can't refer to Go for multithreading because it is true multithreading, while Node.js's multithreading is actually single-threaded for each thread, without locks and such. Go actually has locks. Multithreading without thread synchronization is more suitable for maintenance. I still insist on business separation for multithreading, with the stream still in one thread. This doesn't solve performance issues, but it does solve some CPU-intensive and freezing issues, such as:
After solving these problems, the stability will be improved, and sometimes it is definitely affected by these factors. I don't want to do multithreading for streams, because from the perspective of ease of use, for example, the API needs to double the maintenance cost for clustering, requires scheduling logic (no matter how simple), increases the steps for troubleshooting, and cannot simply evaluate the load. All these factors will greatly reduce the maintenance level of the entire project. The only advantage of multithreading for streams is to increase multi-core capabilities, which can be achieved through cascading (which will be supported in the future) and business scheduling. If you think that running one process on one machine is too wasteful, you can use multiple ports or implement it with Pods. In any case, if you have already reached the point of focusing on high performance, it must be a large business volume of tens of thousands or even hundreds of thousands. If such a large business volume does not have research and development capabilities, it will either be a pitfall or a death trap. |
Beta Was this translation helpful? Give feedback.
-
Update on 2023.07.18 SRS 5.0 and ST now support multi-threading, but it is not utilized in the streaming architecture as it would add unnecessary complexity and hinder system monitoring through Prometheus. The optimal and unified architecture is a proxy cluster. Creating a local proxy cluster to leverage multiple CPUs is a better solution than multi-threading. Nevertheless, we plan to implement multi-threading in the future for disk writing to prevent block IO, such as logging #3647. |
Beta Was this translation helpful? Give feedback.
-
Remark
For now, let's put the multi-threading preparation on hold. Although ST already supports multi-threading and the RTC multi-threading part has been almost completed, there are several factors that make me think we should reconsider whether multi-threading is necessary at this stage.
First of all, the multi-threading branch has been deleted from the SRS repository, but it is still preserved in my repository feature/threads, which mainly includes the following commits:
The main reasons for reconsidering multi-threading support are:
However, simplifying ST and improving its performance can still be considered for merging, including:
Summary
SRS's support for multi-threading is a significant architectural upgrade, essentially aimed at addressing performance issues.
Regarding performance issues, the following points can be expanded:
Why is this issue important?
Therefore, the multi-threading architecture can be considered a revolution after the multi-coroutine architecture, but this time it is a self-revolution.
Arch
The previous SRS single-threaded architecture (SRS/1/2/3/4):
The ultimate goal architecture is horizontally scalable Hybrid threads, also known as low-lock multi-threaded structure (SRS/v5.0.3):
The disadvantages of this architecture:
-no-threads
, but sometimes it may forget to change this option and cause problems. Solution: By default, only 1 SRTP thread is enabled, allowing for a long enough transition and improvement period.Communication Mechanism
There are two ways for threads to communicate: the first is locked chan, and the second is passing fd. The second can rely on the first.
Both methods should avoid passing audio and video data. Of course, they can be passed, but it is not efficient. For example, you can start a transcoding thread, communicate with chan, and it doesn't require much concurrency.
SRS will have multiple ST threads, which communicate through chan, but they do not pass audio and video data, only some coordination messages.
Currently, SRS's thread communication uses pipe implementation to avoid locks. Therefore, when using it, be aware that it is a low-efficiency mechanism and should not directly pass audio and video packets. It is mainly used for communication between the Master (API) thread and the Hybrid (service) thread, with Hybrid returning SDP to the API.
Thread Types
Each thread will have its ST, and ST is thread-local, i.e., independent and isolated ST.
In the end, there will be several types of threads:
Milestones
4.0 will not enable multi-threading, maintaining single-threaded capabilities.
5.0 will implement most of the multi-threading capabilities, including improving ST's thread-local capabilities. However, Hybrid will only default to 1 thread, and although the process has multiple threads, the overall difference from the previous single-thread is not significant.
6.0 will enable as many threads as there are CPU cores by default, completing the entire multi-threaded architecture transformation.
Differences from Go
Go's multi-threading overhead is too high, and its performance is not sufficient, as it is designed for general services.
With multiple cores, such as 16 cores, Go has about 5 cores for switching. This is because there are locks and data copying between multiple threads, even though chan is used.
In addition, Go is genuinely multi-threaded, requiring constant consideration of competition and thread switching, while SRS is still genuinely single-threaded. Go is more complicated to use, while SRS can still maintain the simplicity of single-threading.
SRS is a multi-threaded and coroutine-based architecture optimized for business, essentially still single-threaded, with threads being essentially unrelated.
Relationship with Source
A single ST thread will have multiple sources.
A source, which is a push stream and its corresponding consumer (playback), is only in one ST thread.
In this way, both push and play are completed in a single ST thread, without the need for locks or switching.
Since the client's URL is unknown when connecting, it is also unknown which stream it belongs to, so it may be accepted by the wrong ST thread, requiring FD migration.
Migrating FD between multiple threads is relatively simple. The difficulty lies in ST, which needs to support multi-threading and consider rebuilding the FD in the new ST thread's epoll when migrating FD. However, this is not particularly difficult, and it is much easier than multi-process.
Why not Multi-process
FD migration between multi-processes is too difficult to implement, and communication between processes is not as easy as communication between threads, nor is it as efficient as threads.
The reason why Nginx uses multi-process is that there is no need for FD migration between multiple processes. So when doing live streaming, NginxRTMP processes push streams to each other, which is too difficult to maintain.
If not migrated, audio and video packets need to be forwarded, and it is definitely better and more suitable for streaming media to migrate FD based on the stream.
Thread Local
Each thread has its own ST, which can be referred to as the Envoy Threading Model, using the C++ thread_local keyword to indicate variables.
I wrote an example SRS: thread-local.cpp, with the following results:
It can be used to modify global variables:
Including global pointers:
The addresses and values of these pointers are different in each thread.
GCC __thread
GCC has extended the keyword __thread, which has the same effect as C++11's thread_local.
A multi-threaded version of ST has been implemented before, using gcc's __thread keyword, referring to toffaletti and ST#19.
UDP Binding
RTC's UDP is connectionless, and multiple threads can reuse the fd through
REUSE_PORT
to receive packets sent to the same port.The kernel will perform a five-tuple binding. When the kernel delivers to a certain listen fd, it will continue to deliver to this fd. Refer to udp-client and udp-server:
UDP Migration
If we receive a client packet from a certain fd, such as 3, and find that this client should be received by another fd, such as 4, we can use connect to bind the delivery relationship.
Refer to the example udp-connect-client.cpp and udp-connect-server.cpp. The server receives the packet and continuously uses other fds to connect. The performance is different on different platforms.
CentOS 7 server, listening on
0.0.0.0:8000
, as shown below, can achieve migration twice:CentOS 7 server, if bound to a fixed address, such as eth0 or lo, will not migrate:
Mac server, regardless of which address is bound, will migrate once:
After connecting with @wasphin, we don't want to migrate. Instead, we hope to bind the 5-tuple to this fd after connecting, so as to avoid other FDs receiving packets.
In this case, a more suitable thread model is:
This model is actually a hybrid model:
This hybrid model does not rely on UDP connect, but the performance will be very high when Connect works.
In addition, the encryption and decryption problem can also be solved by a similar hybrid model:
What's special is the disk IO thread, which will definitely use the queue to send messages:
In the early days, we will still pass packets between multiple threads and divide different threads according to the business. As the evolution progresses, we will gradually eliminate the communication and dependencies between threads and turn them into independent threads that do not rely on each other, achieving higher performance.
Beta Was this translation helpful? Give feedback.
All reactions