-
Notifications
You must be signed in to change notification settings - Fork 747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PyTorch] Training is very slow on Linux. #1504
Comments
Try to reduce the number of threads used by PyTorch to 6 or 12, see https://stackoverflow.com/questions/76084214/what-is-recommended-number-of-threads-for-pytorch-based-on-available-cpu-cores |
It's most probably related to pytorch not finding openblas and/or MKL in your path. |
It helps a lot by set |
So the default is 24 on that machine, but it doesn't mean it's going to give good results |
The default is 48 with JavaCPP build, which is too high. It should be 24 for this case. |
Have you tried with the official libtorch ? |
libtorch sets it to 24 by default on my box. And it works well. Why does JavaCPP build libtorch from source? Why not package the precompiled libtorch library from pytorch.org? |
See discussion here |
Here is the result of running the sample MNIST code on a machine with 32 vcores and 16 physical cores:
When forcing the num thread to 16 using |
Training 10 epochs of MNIST (the sample code from your project README) on takes > 500 seconds on Linux (24 cores, ubuntu 22.04). It takes only about 50 seconds on an old mac (4 cores). Both use CPU (no GPU or MPS).
The text was updated successfully, but these errors were encountered: