-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error: OSC UCX component priority set inside component query failed #248
Comments
Hi @cbehan, could you describe in more details how do you call For me, the following commands gave the same
There are two known reasons why binary format can fail:
|
Hi @vasdommes. It seems I spoke to soon. The difference between bin and json that I claimed is not reproducible. I think when I tried bin, I was also requesting more cores than when I tried json. The stream attached is now a controlled test. With 2 cores, bin and json work. But with 3 they both fail. There must be an issue when the number of blocks is too small to be split between all of MPI's cores. Since SDPB is designed for big problems (and it's easy to pass -n 1 when you have a small one) perhaps there is no need to fix this. |
@cbehan thanks for the updates! Your example works fine on my machine with 3 cores. Generally, SDPB should work fine when the number of cores exceeds number of blocks (it should fail only in the extreme case when the number of blocks exceeds the number of nodes). So it's unclear what went wrong for you. |
Let's take a look at the error messages:
This error repeats many times. It is thrown somewhere from OpenMPI code.
Out-of-range error for
This comes from this line in SDPB:
It is interesting that SDPB did not crash immediately but did some nontrivial computations before
So, it looks like some OpenMPI configuration issue or an obscure Elemental bug. |
If the problem is specific to OSC UCX component in OpenMPI, using another OSC component might help. @cbehan could you try the following?
and then try running SDPB with each OSC component (e.g.
On my laptop, |
Thanks for the tips. I have four of those five OSC components.
They all fail but in different ways and the error output is attached. |
@cbehan what OS are you using? In an attempt to reproduce the error, I've installed OpenMPI 5.0.5 from sources according to the official instruction (with default parameters) on my WSL + Ubuntu 22.04, rebuilt Elemental and SDPB with this version. But it works fine for me. For the reference, installation process looked (more or less) like this:
OSC components available:
|
With pycftboot, it seems easy to generate XML files which need to be processed with pmp2sdp -f json instead of pmp2sdp -f bin. The simplest testcase I could make is attached. In json mode it works perfectly but in bin mode it results in:
/usr/include/c++/14.1.1/bits/stl_vector.h:1130: std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator [with _Tp = El::BigFloat; _Alloc = std::allocatorEl::BigFloat; reference = El::BigFloat&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.
mySDP.xml.gz
The text was updated successfully, but these errors were encountered: