Apparently rapid accumulation of roundoff errors #44

daschaich · 2021-01-27T01:45:56Z

I'm working with a student (@FelixSpr) on a pure-gauge project, and we've become puzzled by apparently rapid accumulation of roundoff errors in our code based on MILC's pg_ora (over-relaxed quasi-heatbath).

I have now set up a minimal example of this behavior in a new fork of milc_qcd, which you can see as commit 9985744 of daschaich/milc_qcd. Here, after successfully reproducing an existing 4^3x8 test in double precision on a single core (commit 19b2c53), I simply change division by z (a double) to multiplication by 1.0/z.

Despite the tiny lattice volume, after only five 'trajecs' (each with four over-relaxation steps and four quasi-heatbath steps) roundoff is visible in the printed GMES output, and within 20 trajecs this has accumulated to roughly percent-level effects in the plaquette. In earlier tests using our own code that show this behavior, I have checked that this division-to-multiplication step itself produces changes only at the 1e-16 level I would expect in double precision. I have also checked that turning off the over-relaxation steps reduces but doesn't completely remove this roundoff accumulation.

So it looks as though there is some aspect of the pg_ora program that is causing machine-precision roundoff errors to be rapidly blown up by many orders of magnitude. I find this surprising, but I haven't done much prior work with this algorithm. Is this the expected behavior? If not, can you think of any likely culprit(s) deserving more scrutiny? I'll ping @weberjo8 since you seem to have been working on the pure-gauge code relatively recently.

The text was updated successfully, but these errors were encountered:

detar · 2021-01-27T13:24:27Z

Hi David, This is very strange, indeed. We have not used this part of the code for some 25 years. Is the misbehavior compiler or machine dependent? I'll take a look when I get a chance. Best, Carleton

…

On 1/26/21 6:46 PM, David Schaich wrote: I'm working with a student ***@***.*** <https://github.com/FelixSpr>) on a pure-gauge project, and we've become puzzled by apparently rapid accumulation of roundoff errors in our code <https://github.com/daschaich/SU4/tree/dev> based on MILC's |pg_ora| (over-relaxed quasi-heatbath). I have now set up a minimal example of this behavior in a new fork of |milc_qcd|, which you can see as commit 9985744 <9985744> of daschaich/milc_qcd <daschaich@99857446>. Here, after successfully reproducing an existing 4^3x8 test in double precision on a single core (commit 19b2c53 <19b2c53>), I simply change division by |z| (a |double|) to multiplication by |1.0/z|. Despite the tiny lattice volume, after only five 'trajecs' (each with four over-relaxation steps and four quasi-heatbath steps) roundoff is visible in the printed |GMES| output, and within 20 trajecs this has accumulated to roughly percent-level effects in the plaquette. In earlier tests using our own code that show this behavior, I have checked that this division-to-multiplication step itself produces changes only at the |1e-16| level I would expect in double precision. I have also checked that turning off the over-relaxation steps reduces but doesn't completely remove this roundoff accumulation. So it looks as though there is some aspect of the |pg_ora| program that is causing machine-precision roundoff errors to be rapidly blown up by many orders of magnitude. I find this surprising, but I haven't done much prior work with this algorithm. *Is this the expected behavior?* If not, can you think of any likely culprit(s) deserving more scrutiny? I'll ping @weberjo8 <https://github.com/weberjo8> since you seem to have been working on the pure-gauge code relatively recently. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#44>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABP6HXQJQF245OA6OIE4IFDS35V6HANCNFSM4WUNUMGQ>.

weberjo8 · 2021-01-27T15:21:12Z

Hi David, Sorry, I never looked into any details of the ORA algorithm. I just generalized features of the anisotropic version of the pure gauge codes and tested against existing sample outputs and repeated some nontrivial unit tests that Alexei had suggested. Best regards Johannes

On 1/27/21 2:24 PM, Carleton DeTar wrote: Hi David, This is very strange, indeed. We have not used this part of the code for some 25 years. Is the misbehavior compiler or machine dependent? I'll take a look when I get a chance. Best, Carleton On 1/26/21 6:46 PM, David Schaich wrote: > > I'm working with a student ***@***.*** <https://github.com/FelixSpr>) > on a pure-gauge project, and we've become puzzled by apparently rapid > accumulation of roundoff errors in our code > <https://github.com/daschaich/SU4/tree/dev> based on MILC's |pg_ora| > (over-relaxed quasi-heatbath). > > I have now set up a minimal example of this behavior in a new fork of > |milc_qcd|, which you can see as commit 9985744 > <9985744> > of daschaich/milc_qcd > <daschaich@99857446>. Here, after > successfully reproducing an existing 4^3x8 test in double precision on > a single core (commit 19b2c53 > <19b2c53>), > I simply change division by |z| (a |double|) to multiplication by |1.0/z|. > > Despite the tiny lattice volume, after only five 'trajecs' (each with > four over-relaxation steps and four quasi-heatbath steps) roundoff is > visible in the printed |GMES| output, and within 20 trajecs this has > accumulated to roughly percent-level effects in the plaquette. In > earlier tests using our own code that show this behavior, I have > checked that this division-to-multiplication step itself produces > changes only at the |1e-16| level I would expect in double precision. > I have also checked that turning off the over-relaxation steps reduces > but doesn't completely remove this roundoff accumulation. > > So it looks as though there is some aspect of the |pg_ora| program > that is causing machine-precision roundoff errors to be rapidly blown > up by many orders of magnitude. I find this surprising, but I haven't > done much prior work with this algorithm. *Is this the expected > behavior?* If not, can you think of any likely culprit(s) deserving > more scrutiny? I'll ping @weberjo8 <https://github.com/weberjo8> since > you seem to have been working on the pure-gauge code relatively recently. > > — > You are receiving this because you are subscribed to this thread. > Reply to this email directly, view it on GitHub > <#44>, or unsubscribe > <https://github.com/notifications/unsubscribe-auth/ABP6HXQJQF245OA6OIE4IFDS35V6HANCNFSM4WUNUMGQ>. > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#44 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APJ4MWFYB4UIYCJOYZGLRZLS4AHZZANCNFSM4WUNUMGQ>.

-- Dr.rer.nat. Johannes Heinrich Weber AG Theoretische Teilchenphysik - Gitterfeldtheorie, Institut für Physik IRIS Adlershof, Zum Großen Windkanal 6, D-12489 Berlin Humboldt Universität, Berlin, Germany [email protected]

daschaich · 2021-01-27T15:28:35Z

Thanks Carleton and Johannes, I'm glad I'm not overlooking something immediately obvious.

So far this has occurred wherever we've run and however we've compiled. But that's a short list at the moment---just on laptops and on a local cluster.

By coincidence both my laptop and the cluster were using the same gcc 5.5.0 (with openmpi 1.10.7 for parallel running on the cluster). The cluster has some other compilers available, and on it I just reran the minimal example linked above using gcc 8.3.0 and intel (& intel-mpi) 2019u5.

I see the same qualitative sensitivity to roundoff in all cases, though the specific numbers are different for gnu vs. intel compilers. That is, each of {gcc 5.5.0 scalar; gcc 5.5.0 + openmpi 1.10.7 on 2 cores; gcc 8.3.0 scalar} produce the same GMES output as linked above, while {intel 2019u5 scalar; intel 2019u5 + intel-mpi 2019u5 on 2 cores} produce different output that still changes for division by z vs. multiplication by 1.0/z. In commits 3900ff6 and d71df02 I added the corresponding output for comparison.

detar · 2021-01-28T03:47:45Z

Hi David, Divisions typically cost quite a bit more processor cycles than multiplications. Is the effect sensitive to the optimization level? Did you try -O3 -ffast-math, or the equivalent, for example. This isn't a bug in the code, of course, but it sounds like it is a change worth making. Best, Carleton

…

On 1/27/21 8:28 AM, David Schaich wrote: Thanks Carleton and Johannes, I'm glad I'm not overlooking something immediately obvious. So far this has occurred wherever we've run and however we've compiled. But that's a short list at the moment---just on laptops and on a local cluster. By coincidence both my laptop and the cluster were using the same gcc 5.5.0 (with openmpi 1.10.7 for parallel running on the cluster). The cluster has some other compilers available, and on it I just reran the minimal example linked above using gcc 8.3.0 and intel (& intel-mpi) 2019u5. I see the same qualitative sensitivity to roundoff in all cases, though the specific numbers are different for gnu vs. intel compilers. That is, each of {gcc 5.5.0 scalar; gcc 5.5.0 + openmpi 1.10.7 on 2 cores; gcc 8.3.0 scalar} produce the same GMES output as linked above, while {intel 2019u5 scalar; intel 2019u5 + intel-mpi 2019u5 on 2 cores} produce different output that still changes for division by |z| vs. multiplication by |1.0/z|. In commits 3900ff6 <3900ff6> and d71df02 <d71df02> I added the corresponding output for comparison. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#44 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABP6HXVD6V7QXXLYS7AEEVTS4AWLLANCNFSM4WUNUMGQ>.

daschaich · 2021-01-31T16:28:17Z

Sorry for the slow response.

All the tests above used -O3, partly out of habit, partly to introduce as few changes as possible. I have now checked that the gcc 5.5.0 + openmpi 1.10.7 print exactly the same output with -O0 (a073b57).

I also tried turning off -DFAST (ba7388c, still with -O0 and gcc 5.5.0 + openmpi 1.10.7). Although this did change the output, sensitivity to division by z vs. multiplication by 1.0/z remains (just as happened when testing Intel compilers).

detar · 2021-02-01T02:50:15Z

Hi David, If you want to dig deeper, I guess you'd need to look at the assembly code. Best, Carleton

…

On 1/31/21 9:28 AM, David Schaich wrote: Sorry for the slow response. All the tests above used |-O3|, partly out of habit, partly to introduce as few changes as possible. I have now checked that the gcc 5.5.0 + openmpi 1.10.7 print exactly the same output with |-O0| (a073b57 <a073b57>). I also tried turning off |-DFAST| (ba7388c <ba7388c>, still with |-O0| and gcc 5.5.0 + openmpi 1.10.7). Although this did change the output, sensitivity to division by |z| vs. multiplication by |1.0/z| remains (just as happened when testing Intel compilers). — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#44 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABP6HXRS542KYR5BPIK7W4DS4WAK3ANCNFSM4WUNUMGQ>.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apparently rapid accumulation of roundoff errors #44

Apparently rapid accumulation of roundoff errors #44

daschaich commented Jan 27, 2021

detar commented Jan 27, 2021 via email

weberjo8 commented Jan 27, 2021 via email

daschaich commented Jan 27, 2021

detar commented Jan 28, 2021 via email

daschaich commented Jan 31, 2021

detar commented Feb 1, 2021 via email

Apparently rapid accumulation of roundoff errors #44

Apparently rapid accumulation of roundoff errors #44

Comments

daschaich commented Jan 27, 2021

detar commented Jan 27, 2021 via email

weberjo8 commented Jan 27, 2021 via email

daschaich commented Jan 27, 2021

detar commented Jan 28, 2021 via email

daschaich commented Jan 31, 2021

detar commented Feb 1, 2021 via email