Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDD performance is poor #257

Open
Davester47 opened this issue Mar 14, 2024 · 1 comment · May be fixed by #269
Open

HDD performance is poor #257

Davester47 opened this issue Mar 14, 2024 · 1 comment · May be fixed by #269
Labels
good first issue Good for newcomers

Comments

@Davester47
Copy link

pdu performs about 2x worse on my HDD than single-threaded du. I'm testing on an old home directory of mine on a mechanical hard drive, with about 712 gigabytes of data in around 150,000 files. The size difference reported by the two programs is due to hard links.

$ echo 1 | sudo tee /proc/sys/vm/drop_caches
$ time pdu
...
765.0G ┌─┴.
pdu  0.69s user 2.93s system 4% cpu 1:18.21 total

Compared to du:

$ echo 1 | sudo tee /proc/sys/vm/drop_caches
$ time du -sh .
712G	.
du -sh .  0.28s user 1.46s system 3% cpu 47.405 total

I'm not positive on the source of this difference, but I believe it's due to the directory traversal order used by the two programs. du uses a depth-first search whereas pdu seems to use breadth-first search through rayon, although I can't tell for sure. Interestingly, pdu is comparable to du when manually limited to a single thread:

$ echo 1 | sudo tee /proc/sys/vm/drop_caches
$ time RAYON_NUM_THREADS=1 pdu
...
765.0G ┌─┴.
RAYON_NUM_THREADS=1 pdu  0.46s user 1.87s system 5% cpu 46.078 total
@KSXGitHub
Copy link
Owner

pdu was never really designed to run on HDD (I forgot to mention it in README.md). But if there's an easy way to detect HDD and limit rayon thread to 1, I'll be happy to accept a pull request. Unless multi-threaded du is still faster on HDD for some reason.

@KSXGitHub KSXGitHub added the good first issue Good for newcomers label Mar 15, 2024
Integral-Tech added a commit to Integral-Tech/parallel-disk-usage that referenced this issue Nov 24, 2024
On HDD, multi-threaded du won't provide any performance gain over
the single-threaded one. Therefore, when HDD is detected on one of
the files, limit the thread number to 1.

This PR fixes issue KSXGitHub#257.
Integral-Tech added a commit to Integral-Tech/parallel-disk-usage that referenced this issue Nov 24, 2024
On HDD, multi-threaded du won't provide any performance benefit over
the single-threaded one. Therefore, when HDD is detected on one of
the files, limit the thread number to 1.

This PR fixes issue KSXGitHub#257.
Integral-Tech added a commit to Integral-Tech/parallel-disk-usage that referenced this issue Nov 24, 2024
On HDD, multi-threaded du won't provide any performance benefit over
the single-threaded one. Therefore, when HDD is detected on one of
the files, limit the thread number to 1.

This PR fixes issue KSXGitHub#257.
Integral-Tech added a commit to Integral-Tech/parallel-disk-usage that referenced this issue Nov 24, 2024
On HDD, multi-threaded du won't provide any performance benefit over
the single-threaded one. Therefore, when HDD is detected on one of
the files, limit the thread number to 1.

This PR fixes issue KSXGitHub#257.
Integral-Tech added a commit to Integral-Tech/parallel-disk-usage that referenced this issue Nov 24, 2024
On HDD, multi-threaded du won't provide any performance benefit over
the single-threaded one. Therefore, when HDD is detected on one of
the files, limit the thread number to 1.

This PR fixes issue KSXGitHub#257.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants