Skip to content

Materials for the artifact evaluation of the SIGCOMM'18 Homa paper

Notifications You must be signed in to change notification settings

PlatformLab/homa-paper-artifact

Repository files navigation

SIGCOMM'18 artifact for "Homa: a receiver-driven low-latency transport protocol using network priorities"

Introduction

With this artifact, we provide the reviewers with the ability to run workloads W3-W5 using the RAMCloud implementation of Homa transport and reproduce its performance numbers. For the simulation code, please check out the HomaSimulation repository. The CDF files of the workloads can be found here; they are copied and renamed to W1-W5 in the RAMCloud repository.

The files included in this repository are:

$ tree
.
├── getRamcloud.sh
├── localconfigGen.py
├── profile.py
├── README.adoc
├── setup-45XGc-QoS.py
└── startup.sh

Experiment Setup

We conduct our experiment using the m510 machines available at CloudLab.

Start Experiment

To start a new experiment, follow the instructions at CloudLab’s getting-started page and use the public profile named HomaArtifactEvaluation. This profile will help you reserve a full chassis of 45 nodes interconnected by a Moonshot 45XGc switch.

It could take 10-15 minutes to instantiate a new experiment and complete our custom startup service. Make sure file /local/startup_service_done is present on all nodes of the experiment before proceeding to the next step.

Switch Configuration

Homa transport relies on network priorities to achieve low tail-latency. Therefore, we need to enable the QoS setting of the 45XGc switch to recognize the packet priorities. Note that the current policy of CloudLab is to only grant full switch access to people who have reserved a full chassis. This can be sometimes difficult depending on the resource availability. As of 12/2018, the best way to go about this is to use the reservation mechanism of CloudLab. Once you have successfully reserved a full chassis, you can contact the CloudLab support team ([email protected]) to request access to the switch. You should receive further instructions shortly. Extend the experiment to avoid expiration if necessary. The commands used to configure the switch can be generated by running setup-45XGc-QoS.py. Once you log in to the switch console, the best way to configure the switch is to use its builtin Python interpreter:

No directory, logging in with HOME=/
Trying 127.0.0.1...
Connected to localhost.
Escape character is 'off'.

<ms-chassis13-sb>python
Python 2.7.3 (default, Apr 10 2014, 16:32:11)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import comware
>>> cmds = '<COPY-THE-COMMANDS-GENERATED-ABOVE-HERE>'
>>> comware.CLI(cmds)

If the commands are working, you should see something like the following:

<ms-chassis13-sb>system-view
System View: return to User View with Ctrl+Z.
[ms-chassis13-sb]qos map-table dot1p-lp
[ms-chassis13-sb-maptbl-dot1p-lp]import 0 export 1
[ms-chassis13-sb-maptbl-dot1p-lp]import 1 export 0
[ms-chassis13-sb-maptbl-dot1p-lp]import 2 export 2
[ms-chassis13-sb-maptbl-dot1p-lp]interface Ten-GigabitEthernet1/0/1
[ms-chassis13-sb-Ten-GigabitEthernet1/0/1]qos trust dot1p
[ms-chassis13-sb-Ten-GigabitEthernet1/0/1]qos sp
[ms-chassis13-sb-Ten-GigabitEthernet1/0/1]quit
[ms-chassis13-sb]interface Ten-GigabitEthernet1/0/2
[ms-chassis13-sb-Ten-GigabitEthernet1/0/2]qos trust dot1p
[ms-chassis13-sb-Ten-GigabitEthernet1/0/2]qos sp
[ms-chassis13-sb-Ten-GigabitEthernet1/0/2]quit
... more output omitted...
[ms-chassis13-sb]interface Ten-GigabitEthernet1/0/45
[ms-chassis13-sb-Ten-GigabitEthernet1/0/45]qos trust dot1p
[ms-chassis13-sb-Ten-GigabitEthernet1/0/45]qos sp
[ms-chassis13-sb-Ten-GigabitEthernet1/0/45]quit
[ms-chassis13-sb]quit
<comware.CLI object at 0x181f1090>
>>>

Build RAMCloud

To fetch the source code of RAMCloud and build the executables, run the following on node rcmaster:

$ cd /shome
$ /local/repository/getRamcloud.sh

RAMCloud will be available at /shome/RAMCloud when the script completes.

Run Experiments

All commands in this section are assumed to run from the RAMCloud top directory at /shome/RAMCloud on node rcmaster.

Sanity Check

To make sure RAMCloud and DPDK are built correctly, run a basic performance test as

$ scripts/clusterperf.py --superuser --replicas 0 --transport homa+dpdk --dpdkPort 1 --verbose echo_basic

If everything works as expected, you should see performance numbers similar to the following output (note: make sure CPU governor is set to performance and idle=poll is provided as a kernel boot parameter):

echo0                  4.4 us     send 0B message, receive 0B message median
echo0.min              4.2 us     send 0B message, receive 0B message minimum
echo0.9                4.8 us     send 0B message, receive 0B message 90%
echo0.99               5.4 us     send 0B message, receive 0B message 99%
echo0.999             18.2 us     send 0B message, receive 0B message 99.9%
echoBw0                0.0 B/s    bandwidth sending 0B messages
echo100                4.9 us     send 100B message, receive 100B message median
echo100.min            4.8 us     send 100B message, receive 100B message minimum
echo100.9              5.2 us     send 100B message, receive 100B message 90%
echo100.99             5.5 us     send 100B message, receive 100B message 99%
echo100.999            7.3 us     send 100B message, receive 100B message 99.9%
echoBw100             18.7 MB/s   bandwidth sending 100B messages
echo1K                 8.7 us     send 1000B message, receive 1KB message median
echo1K.min             8.5 us     send 1000B message, receive 1KB message minimum
echo1K.9               9.0 us     send 1000B message, receive 1KB message 90%
echo1K.99              9.3 us     send 1000B message, receive 1KB message 99%
echo1K.999            11.5 us     send 1000B message, receive 1KB message 99.9%
echoBw1K             107.7 MB/s   bandwidth sending 1KB messages
echo10K               25.0 us     send 10000B message, receive 10KB message median
echo10K.min           24.9 us     send 10000B message, receive 10KB message minimum
echo10K.9             25.1 us     send 10000B message, receive 10KB message 90%
echo10K.99            25.5 us     send 10000B message, receive 10KB message 99%
echo10K.999           73.9 us     send 10000B message, receive 10KB message 99.9%
echoBw10K            376.1 MB/s   bandwidth sending 10KB messages
echo100K             178.0 us     send 100000B message, receive 100KB message median
echo100K.min         177.7 us     send 100000B message, receive 100KB message minimum
echo100K.9           178.5 us     send 100000B message, receive 100KB message 90%
echo100K.99          181.8 us     send 100000B message, receive 100KB message 99%
echo100K.999         357.7 us     send 100000B message, receive 100KB message 99.9%
echoBw100K           532.6 MB/s   bandwidth sending 100KB messages
echo1M                1.72 ms     send 1000000B message, receive 1MB message median
echo1M.min            1.71 ms     send 1000000B message, receive 1MB message minimum
echo1M.9              1.72 ms     send 1000000B message, receive 1MB message 90%
echo1M.99             1.89 ms     send 1000000B message, receive 1MB message 99%
echo1M.999            2.04 ms     send 1000000B message, receive 1MB message 99.9%
echoBw1M             553.8 MB/s   bandwidth sending 1MB messages

Generate Baseline Numbers

Before we can run the workloads and generate the slowdown numbers reported in the paper, we need to first obtain the baseline latency numbers (i.e., when the network is empty) for all message sizes in workloads W3-W5. This can be done by running

$ benchmarks/homa/scripts/compute_baseline.sh basic+dpdk W3
$ benchmarks/homa/scripts/compute_baseline.sh basic+dpdk W4
$ benchmarks/homa/scripts/compute_baseline.sh basic+dpdk W5
$ benchmarks/homa/scripts/compute_baseline.sh homa+dpdk W3
$ benchmarks/homa/scripts/compute_baseline.sh homa+dpdk W4
$ benchmarks/homa/scripts/compute_baseline.sh homa+dpdk W5

This step could take a while for workloads with many different message sizes. You can monitor the progress by

$ watch "tail logs/latest/client*.log"

The results will be written to benchmarks/homa/{basic,homa}_{W3,W4,W5}_baseline.txt.

Run Workloads

To run a particular workload with various configurations (e.g. homa vs. basic, load factor, # priorites available, etc.), use the run_workload.sh script. This script will run the same workload using different configurations and compute the corresponding message slowdown numbers in the end. For example, the following command will run worload W3 with 16 nodes using different configurations with each configuration run taking 100 seconds:

$ benchmarks/homa/scripts/run_workload.sh W3 16 100

Each configuration run must be long enough to collect enough samples to compute 99-percentile tail latency for each message size. For W3 and W5, we recommend allocating at least one hour to each configuration run; for W4, 10 minutes should be enough.

Each invocation of the run_workload.sh script will create a unique directory that looks something like homa_experiment_YYYYMMDDHHMMSS. You can find the computed slowdown numbers (in slowdownImpl.txt), the raw message round-trip latency numbers (in *_experiment.txt), and some RAMCloud log files inside that directory.

About

Materials for the artifact evaluation of the SIGCOMM'18 Homa paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published