Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quda 0.8.0 and milc 7.7.13 "ERROR: Solve precision ..." #5

Open
lcosmai opened this issue Feb 21, 2016 · 11 comments
Open

quda 0.8.0 and milc 7.7.13 "ERROR: Solve precision ..." #5

lcosmai opened this issue Feb 21, 2016 · 11 comments

Comments

@lcosmai
Copy link

lcosmai commented Feb 21, 2016

I compiled the target su3_rhmc_hisq for ks_imp_rhmc
in the last stable release of MILC
(https://github.com/milc-qcd/milc_qcd.git Branch:master)
with quda v0.8.0
(https://github.com/lattice/quda.git Branch:master)
using the following Makefile:
https://drive.google.com/file/d/0BxE4mI8SH7wsSnZEaDQyeEQzVVE/view?usp=sharing

Then I performed a short test using 1 GPU.
The job aborted with the following error message:
ERROR: Solve precision 4 doesn't match gauge precision 8 (rank 0, host node496, interface_quda.cpp:1904 in checkGauge())
last kernel called was (name=N4quda22HeavyQuarkResidualNormI7double37double2S2_EE,volume=4x8x8x8,aux=vol=2048,stride=2304,precision=8)

Note that if I instead use quda v0.7.2, the same test job is completed without errors.

@mathiaswagner
Copy link
Contributor

Just to clarify:Which MILC version did you use.

I think (https://github.com/milc-qcd/milc_qcd.git Branch:master) corresponds to MILC 7.7.13, not 7.8.0 as mentioned in the bug title.

7.8.0 might well not be compatible with quda 0.8 as there have been quite a few changes affecting MILC.

@lcosmai lcosmai changed the title quda 0.8.0 and milc 7.8.0 "ERROR: Solve precision ..." quda 0.8.0 and milc 7.7.13 "ERROR: Solve precision ..." Feb 22, 2016
@lcosmai
Copy link
Author

lcosmai commented Feb 22, 2016

You are right. I have just changed 7.8.0 to 7.7.13 in the title.

@detar
Copy link
Contributor

detar commented Feb 22, 2016

Hi Mathias,

To provide a more stable definition of MILC code versions on github,
last week we created two new branches, milc_qcd-7.7.13 and
milc_qcd-7.8.0. They are supposed to be release versions of the code.
The branch 7.7.13 is closest to the one Leonardo was using, and the
branch 7.8.0 is close to the current master branch. The master branch
is the development branch, so it will continue to evolve. It is
unlikely we will make any changes to the milc_qcd-7.7.13 and
milc_qcd-7.8.0 branches unless they are to fix critical bugs.
Eventually, we will copy the master branch to a new release branch. (I
think this is the model you also prefer.)

Best,
Carleton

On 2/22/16 6:16 AM, Mathias Wagner wrote:

Just to clarify:Which MILC version did you use.

I think (https://github.com/milc-qcd/milc_qcd.git Branch:master)
corresponds to MILC 7.7.13, not 7.8.0 as mentioned in the bug title.

7.8.0 might well not be compatible with quda 0.8 as there have been
quite a few changes affecting MILC.


Reply to this email directly or view it on GitHub
#5 (comment).

@mathiaswagner
Copy link
Contributor

Hi Carleton,

thanks for the correction. It looks like I was confused here.
I will try to check QUDA 0.8 with

  • milc_qcd-7.7.13
  • milc_qcd-7.8.0

and try to reproduce the issue.

Sidenote: For QUDA we use a branch called develop for development and copy that over to a new release. We use master for the most recent release version (currently 0.8). This is to make sure that a git clone gives you a (hopefully) stable quda version.

Mathias

NVIDIA GmbH, Wuerselen, Germany, Amtsgericht Aachen, HRB 8361
Managing Director: Karen Theresa Burns


This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by

reply email and destroy all copies of the original message.

@mathiaswagner
Copy link
Contributor

@lcosmai Can you share some more of the surrounding MILC output as well as your input file to MILC?
This makes it easier to track down where the error was triggered.

@mathiaswagner
Copy link
Contributor

@maddyscientist Not sure whether you are already reading so just wanted to make sure you are aware.

@lcosmai
Copy link
Author

lcosmai commented Feb 22, 2016

I shared
(https://drive.google.com/folderview?id=0BxE4mI8SH7wsY2lianUyRDlCWG8&usp=sharing)
the directory where the job has been launched.
In the same directory there is also a README file with more details.

On 2/22/16 5:13 PM, Mathias Wagner wrote:

@lcosmai https://github.com/lcosmai Can you share some more of the
surrounding MILC output as well as your input file to MILC?
This makes it easier to track down where the error was triggered.


Reply to this email directly or view it on GitHub
#5 (comment).

Leonardo Cosmai
INFN Bari
Via Amendola 173
70126 Bari - Italy
office: +39 080 5443207
mobile: +39 340 3580207

@mathiaswagner
Copy link
Contributor

Ok. I managed to reproduce the issue by using the MILC provided sample input

~/milc_qcd/ks_imp_rhmc/test$ ../su3_rhmc_hisq su3_rhmc_hisq.2.sample-in 

using quda 0.8 and MILC 7.7.13.

@mathiaswagner
Copy link
Contributor

As this might be an issue either in MILC or in QUDA I also created
lattice/quda#439
to have a pointer in the QUDA issues tracker.

@mathiaswagner
Copy link
Contributor

Setting

    prec_pbp 2

seems to be a workaround. Still need to check why this worked with quda 0.7.2.

@mathiaswagner
Copy link
Contributor

This will be fixed with quda 0.8.1. For now please stick to the workaround and lattice/quda#439

detar pushed a commit that referenced this issue Oct 4, 2017
This was referenced Oct 4, 2017
detar pushed a commit that referenced this issue Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants