Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDS inconsistent performance (b/w drops significantly) #30

Open
singhsaluja opened this issue Oct 28, 2023 · 0 comments
Open

GDS inconsistent performance (b/w drops significantly) #30

singhsaluja opened this issue Oct 28, 2023 · 0 comments

Comments

@singhsaluja
Copy link

While running the (gdsio) benchmark from storage layer (NFSoRDMA) VAST. The read/writes start out good ~18G/s but in the middle of benchmark the speed drops significantly as low as 3G/s. Any insights into this?

Here are the test params I am using:

Write Test: gdsio -T 60 -D /gpu-direct-storage/ -w 192 -d 1 -I 1 -x 0 -s 1G -I 1M
Throughput: 2.308903 GiB/sec, Avg_Latency: 81215.056568 usecs

Read Test: gdsio -T 60 -D /gpu-direct-storage/ -w 192 -d 1 -I 0 -x 0 -s 1G -I 1M
Throughput: 6.885191 GiB/sec, Avg_Latency: 27231.054742 usecs

GDS Disabled Read Test: switching -X <xfer_type> 2 (CPU_GPU) the throughput is far better:
Throughput: 20.600775 GiB/sec, Avg_Latency: 9095.061970 usecs

We have verified following:

  1. IOMMU is disabled
  2. nvidia-peermem.ko and nvidia-fs.ko are installed correctly and gdscheck.py -p utility reports NFS is supported

nvidia-smi topo -m reports following:

        GPU0    GPU1    NIC0    NIC1    CPU Affinity    NUMA Affinity
GPU0     X      NV4     SYS     SYS     0-31    0
GPU1    NV4      X      SYS     SYS     0-31    0
NIC0    SYS     SYS      X      PIX
NIC1    SYS     SYS     PIX      X

Any insights what I should try next to debug this issue. Thank you for your time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant