Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apparently rapid accumulation of roundoff errors #44

Open
daschaich opened this issue Jan 27, 2021 · 6 comments
Open

Apparently rapid accumulation of roundoff errors #44

daschaich opened this issue Jan 27, 2021 · 6 comments

Comments

@daschaich
Copy link

I'm working with a student (@FelixSpr) on a pure-gauge project, and we've become puzzled by apparently rapid accumulation of roundoff errors in our code based on MILC's pg_ora (over-relaxed quasi-heatbath).

I have now set up a minimal example of this behavior in a new fork of milc_qcd, which you can see as commit 9985744 of daschaich/milc_qcd. Here, after successfully reproducing an existing 4^3x8 test in double precision on a single core (commit 19b2c53), I simply change division by z (a double) to multiplication by 1.0/z.

Despite the tiny lattice volume, after only five 'trajecs' (each with four over-relaxation steps and four quasi-heatbath steps) roundoff is visible in the printed GMES output, and within 20 trajecs this has accumulated to roughly percent-level effects in the plaquette. In earlier tests using our own code that show this behavior, I have checked that this division-to-multiplication step itself produces changes only at the 1e-16 level I would expect in double precision. I have also checked that turning off the over-relaxation steps reduces but doesn't completely remove this roundoff accumulation.

So it looks as though there is some aspect of the pg_ora program that is causing machine-precision roundoff errors to be rapidly blown up by many orders of magnitude. I find this surprising, but I haven't done much prior work with this algorithm. Is this the expected behavior? If not, can you think of any likely culprit(s) deserving more scrutiny? I'll ping @weberjo8 since you seem to have been working on the pure-gauge code relatively recently.

@detar
Copy link
Contributor

detar commented Jan 27, 2021 via email

@weberjo8
Copy link
Collaborator

weberjo8 commented Jan 27, 2021 via email

@daschaich
Copy link
Author

Thanks Carleton and Johannes, I'm glad I'm not overlooking something immediately obvious.

So far this has occurred wherever we've run and however we've compiled. But that's a short list at the moment---just on laptops and on a local cluster.

By coincidence both my laptop and the cluster were using the same gcc 5.5.0 (with openmpi 1.10.7 for parallel running on the cluster). The cluster has some other compilers available, and on it I just reran the minimal example linked above using gcc 8.3.0 and intel (& intel-mpi) 2019u5.

I see the same qualitative sensitivity to roundoff in all cases, though the specific numbers are different for gnu vs. intel compilers. That is, each of {gcc 5.5.0 scalar; gcc 5.5.0 + openmpi 1.10.7 on 2 cores; gcc 8.3.0 scalar} produce the same GMES output as linked above, while {intel 2019u5 scalar; intel 2019u5 + intel-mpi 2019u5 on 2 cores} produce different output that still changes for division by z vs. multiplication by 1.0/z. In commits 3900ff6 and d71df02 I added the corresponding output for comparison.

@detar
Copy link
Contributor

detar commented Jan 28, 2021 via email

@daschaich
Copy link
Author

Sorry for the slow response.

All the tests above used -O3, partly out of habit, partly to introduce as few changes as possible. I have now checked that the gcc 5.5.0 + openmpi 1.10.7 print exactly the same output with -O0 (a073b57).

I also tried turning off -DFAST (ba7388c, still with -O0 and gcc 5.5.0 + openmpi 1.10.7). Although this did change the output, sensitivity to division by z vs. multiplication by 1.0/z remains (just as happened when testing Intel compilers).

@detar
Copy link
Contributor

detar commented Feb 1, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants