Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flood publish #15

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from
Draft

Flood publish #15

wants to merge 11 commits into from

Conversation

ackintosh
Copy link
Member

@ackintosh ackintosh commented Apr 2, 2023

👷 This PR will be ready for review once the improvement on flood publishing has been merged. 👷

Flood Publish Simulation

This simulation creates a number of nodes in which flood publishing is enabled
and help users to measure the latency of messages.

In this simulation, pictured below, each node logs the time when they emit a HandlerIn::Messageevent to the handler, and when handle_received_message() is called.

NOTE: Both the event and function are defined inside of rust-libp2p. So this simulation uses a forked rust-libp2p that includes the logging. See here for the diffs in the forked one.

sequenceDiagram
    participant Node1
    participant Node2
    participant Node3
    
    loop Simulation
        Note over Node1: HandlerIn::Message
        Node1->>Node2: message
        Note over Node2: handle_received_message()
        Note over Node1: HandlerIn::Message
        Node1->>Node3: message
        Note over Node3: handle_received_message()
    end
Loading

Using measure_latency.py we can measure the time between the two.

testground run single \
  --plan gossipsub-testground/flood_publishing \
  --testcase flood_publishing \
  ...
  ...
  | grep flood_publishing_test \ # Filter std output to be passed to mesure_latency.py
  | python3 flood_publishing/measure_latency.py # Measure latency

Running the Simulation

The type of flood publishing can be switched via --test-param flood_publish=heartbeat. Please read the flood_publishing/manifest.toml to understand test parameters.

testground run single \
  --plan gossipsub-testground/flood_publishing \
  --testcase flood_publishing \
  --builder docker:generic \
  --runner local:docker \
  --instances 50 \
  --wait \
  --test-param flood_publish=heartbeat \
  | grep flood_publishing_test \
  | python3 flood_publishing/measure_latency.py

Measurement Results

It appears that latency has been reduced by approximately 30% when comparing the Rapid and Heartbeat.

  • bandwidth: 30MiB
  • instances: 50
  • message size: 50KB

Rapid

Command
testground run single \
  --plan gossipsub-testground/flood_publishing \
  --testcase flood_publishing \
  --builder docker:generic \
  --runner local:docker \
  --instances 50 \
  --wait \
  --test-param flood_publish=rapid \
  | grep flood_publishing_test \
  | python3 flood_publishing/measure_latency.py
*** measure_latency.py ***
[publisher] node_id: 339681 , peer_id: 12D3KooWRaHQje9JBkjNsCN2S4bPDoeJTNQvwa7q3XSY4Xk6kBRh
[nodes] 50
[send_logs] 280
[receive_logs] 280

* Results (in milliseconds) *
[mean] 664.05
[median] 681.0

Heartbeat

Command
testground run single \
  --plan gossipsub-testground/flood_publishing \
  --testcase flood_publishing \
  --builder docker:generic \
  --runner local:docker \
  --instances 50 \
  --wait \
  --test-param flood_publish=heartbeat \
  | grep flood_publishing_test \
  | python3 flood_publishing/measure_latency.py
*** measure_latency.py ***
[publisher] node_id: 6f17de , peer_id: 12D3KooWSi9kmfo5ozCjTVGHBe5u26hhBq63bcfBLL9CJBgec8Bb
[nodes] 50
[send_logs] 290
[receive_logs] 290

* Results (in milliseconds) *
[mean] 391.94827586206895
[median] 444.0

@mxinden
Copy link

mxinden commented May 31, 2023

Thank you @ackintosh for providing these numbers. That is very helpful.

I acknowledge that libp2p/rust-libp2p#3666 shows a significant change in sending latency. Though one also needs to keep in mind that those nodes outside of the mesh will only receive the message on the next heartbeat, thus have a significant delay.

I wonder whether the problem should be solved at the Gossipsub level, or whether it is worth investing into the lower transport layer, improving base bandwidth.

Out of curiosity, I wonder how this would play out when using a more powerful transport protocol. Early results from our measurements show that libp2p/rust-libp2p#3454 has a significant bandwidth improvement compared to our existing libp2p-quic transport and libp2p-tcp transport (roughly or bigger than 10x).

newplot(1)

See libp2p/test-plans#184 for details.

Would you mind running this test with libp2p/rust-libp2p#3454?

@ackintosh
Copy link
Member Author

@mxinden I have created another test plan to run this test with the quic implementation. The result shows ~5% improvement on latency.
ackintosh#3

@diegomrsantos
Copy link

@mxinden and @ackintosh do any of you have a hypothesis that could explain why a 10x increase in throughput resulted in only a 5% improvement in latency when using QUIC?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants