You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Full STT inference is currently done on the server. Currently, once a client connects to the server, the server becomes busy and nobody else can use it, i.e. it does not scale at all. What are our possibilities to increase these capabilities?
We have mainly two variables: Client selected language and the connection, which is an uninterrupted stream...
1. NodeJS Threads / Clusters - flexible language
We might use cluster/thread mechanisms from NodeJS. One server with multiple cores can handle multiple audio streams. A connection will spawn a worker process, negociate with the client on a language and serve it until the connection is closed.
This would require a more powerful server and will be limited to the number of cores.
2. More servers - flexible language
We might use a server pool of low-cost/free servers. Clients can poll them until they find a free one. A new client might also scan all servers at the start in parallel to get their status?
This can scale better but will be limited to the number of mini-servers again.
3. Dedicated language servers - limited communication => shared server/process
We want only relevant commands to be transferred to the server, not the whole sounds in the environment continuously. This can be achieved by a push-and-speak type (walkie-talkie style) client configuration or by preprocessing voice activity on the client for sufficient data (to remove silences, background noise etc), or any other method, which should be another discussion topic. This way, the connection will be used for a relatively short time.
If we can do this, a language process (be it a mini-server or a worker process on a core) can share its STT capabilities with multiple users. We don't want to dedicate a server to a language thou, as some might become idle, some others congested. Instead:
New connection requesting language A => If there is a server/process for it, try to use it. If none, spawn it for that language.
Limit connections per process (slots) to N, if full spawn a new one.
A connection closed => free the slot. If no other connections let, free the worker from language dedication.
We might need to implement a buffering/queuing mechanism and fine-tune it to find the optimal number of slots.
More?
...
The text was updated successfully, but these errors were encountered:
Full STT inference is currently done on the server. Currently, once a client connects to the server, the server becomes busy and nobody else can use it, i.e. it does not scale at all. What are our possibilities to increase these capabilities?
We have mainly two variables: Client selected language and the connection, which is an uninterrupted stream...
1. NodeJS Threads / Clusters - flexible language
We might use cluster/thread mechanisms from NodeJS. One server with multiple cores can handle multiple audio streams. A connection will spawn a worker process, negociate with the client on a language and serve it until the connection is closed.
This would require a more powerful server and will be limited to the number of cores.
2. More servers - flexible language
We might use a server pool of low-cost/free servers. Clients can poll them until they find a free one. A new client might also scan all servers at the start in parallel to get their status?
This can scale better but will be limited to the number of mini-servers again.
3. Dedicated language servers - limited communication => shared server/process
We want only relevant commands to be transferred to the server, not the whole sounds in the environment continuously. This can be achieved by a push-and-speak type (walkie-talkie style) client configuration or by preprocessing voice activity on the client for sufficient data (to remove silences, background noise etc), or any other method, which should be another discussion topic. This way, the connection will be used for a relatively short time.
If we can do this, a language process (be it a mini-server or a worker process on a core) can share its STT capabilities with multiple users. We don't want to dedicate a server to a language thou, as some might become idle, some others congested. Instead:
We might need to implement a buffering/queuing mechanism and fine-tune it to find the optimal number of slots.
More?
...
The text was updated successfully, but these errors were encountered: