-
Notifications
You must be signed in to change notification settings - Fork 14
Protocol request: Direct group communication protocol for low-latency applications (<100ms) #446
Comments
If nodes in the limited context are supposed to be trusted as well as with no churn, then, for a more advanced solution, you may want to consider the Kademlia routing overlay which features lower storage overhead (logarithmic instead of linear) and logarithmic routing complexity. @oskarth Update: Had another look at the issue, I think Kademlia might not be very relevant. |
Draft notes for potential bountyoutcome sketch:
Could possibly split this up User story: As a user of Waku you should be able to find other nodes (e.g. in chat) and then establish a direct WebRTC connection |
Nimbus also has a use case for this where we would allow a group of Nimbus beacon nodes to work together in a way that ensures that there is no single point of failure in the system. The low latency is key for ensuring that all validator actions are performed in time (the validator rewards don't suffer as a result of latency) and Vac/Waku seem useful in the sense that they may allow the group to be formed with almost zero network configuration. The nodes can form groups automatically based on the validator identities and the user wouldn't have to deal with things such as public/private IP addresses, port forwarding, VPNs, etc. |
Thanks for the input @zah. What is the current roadmap? is that something you would us to explore further? |
I wonder if the best way forward would be to create a nwaku PoC. According to the requirements above and from https://notes.status.im/waku-vac-devcon-2022#
It seems that we still need some nat traversal/hole punching first in nwaku/nim-libp2p for that. @jm-clius what is the status for this and what issues are tracking? Some design assumptions:
Possible protocol (Alice, Bob are different nodes handled by the same validator as described above)
Other ideas:
|
A similar topic was going with the "Application-Layer Multicast" name some time ago. Focusing on low-latency, I could point to deadline-based schedulers (Abeni, L., Kiraly, C., Lo Cigno, R. (2009). On the Optimal Scheduling of Streaming Applications in Unstructured Meshes), and some other works we did in low-latency video distribution. This means:
These together can nicely reduce the overall latency distribution. |
The usefulness of the proposed Nimbus setup increases dramatically when there are at least 3 nodes in the group (you would then use 2 out of 3 threshold signing to allow one of the nodes to be offline without disrupting the system). The ideal setup would involve 5 nodes configured with 3 out of 5 threshold signing. Using the public key hash in the topic name is not an ideal solution as this would allow other nodes on the network to speculatively monitor all public keys to discover the ENRs of the participating nodes, but this is just a detail for which we'll surely find an appropriate solution. Setup with more nodes won't improve the reliability further, but the latency will be increased, so I think for our use case we care about group sizes of up to 5 nodes. Due to this, I think a full mesh would be the most appropriate topology (every node sends its own messages to all other nodes). |
This is tracked as medium-to-high priority (my interpretation) in the nim-libp2p roadmap: vacp2p/nim-libp2p#777 |
nim-libp2p can already be used as a hole punching server (autonat & relay are available), but cannot hole punch itself (missing the dctur for that) |
Such enhancement would also be interesting for larger data transfer. |
Issue moved here |
Problem
Some applications have a requirement for lower-latency direct communication as a group. This can be due to (soft) real time comm requirement. For example, video chat.
This can either be for 1-1 or as a group of N participants.
Relay/Gossip latency
From https://research.protocol.ai/publications/gossipsub-v1.1-evaluation-report/
This is what we are working with. More benchmarking etc can be done, but gossiping over multiple hops in open network will always have some latency.
Example usage
Status voice/video chat in browser for N participants. See e.g. https://discuss.status.im/t/waku-v2-webrtc-subprotocol-for-voice-and-video-chat/1850 by @decanus
WalletConnect low latency connection between dapps and wallets, "faster relay, <500ms". @pedrouid can probably elaborate more on specific requirements here.
Sketch
Basically we want to trade-off some metadata protection and flexibility for latency in a specific negotiated context.
We can use relay protocol to discover peers to talk to, then negotiate a separate group context where all nodes can dial each other. Then based on that context
The simplest version would be a 1-1 direct voice chat, say. Initially via WebSockets but WebRTC (or possibly QUIC?) would be useful to do things like video chat in a browser.
There may be some more infrastructure work on libp2p needed here to make this suitable for voice/video, cc @dryajov re this
100ms is based on general response time limits (https://www.nngroup.com/articles/response-times-3-important-limits/) as well as intuition re things like FPS gaming for "real time feel".
Acceptance criteria
Issue with more limited scope for PoC
Better understanding of hard requirements and required work / reduced uncertainty on things like:
^ @D4nte @jm-clius @arnetheduck @staheri14 FYI
The text was updated successfully, but these errors were encountered: