-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apparently rapid accumulation of roundoff errors #44
Comments
Hi David,
This is very strange, indeed. We have not used this part of the code
for some 25 years. Is the misbehavior compiler or machine dependent?
I'll take a look when I get a chance.
Best,
Carleton
…On 1/26/21 6:46 PM, David Schaich wrote:
I'm working with a student ***@***.*** <https://github.com/FelixSpr>)
on a pure-gauge project, and we've become puzzled by apparently rapid
accumulation of roundoff errors in our code
<https://github.com/daschaich/SU4/tree/dev> based on MILC's |pg_ora|
(over-relaxed quasi-heatbath).
I have now set up a minimal example of this behavior in a new fork of
|milc_qcd|, which you can see as commit 9985744
<9985744>
of daschaich/milc_qcd
<daschaich@99857446>. Here, after
successfully reproducing an existing 4^3x8 test in double precision on
a single core (commit 19b2c53
<19b2c53>),
I simply change division by |z| (a |double|) to multiplication by |1.0/z|.
Despite the tiny lattice volume, after only five 'trajecs' (each with
four over-relaxation steps and four quasi-heatbath steps) roundoff is
visible in the printed |GMES| output, and within 20 trajecs this has
accumulated to roughly percent-level effects in the plaquette. In
earlier tests using our own code that show this behavior, I have
checked that this division-to-multiplication step itself produces
changes only at the |1e-16| level I would expect in double precision.
I have also checked that turning off the over-relaxation steps reduces
but doesn't completely remove this roundoff accumulation.
So it looks as though there is some aspect of the |pg_ora| program
that is causing machine-precision roundoff errors to be rapidly blown
up by many orders of magnitude. I find this surprising, but I haven't
done much prior work with this algorithm. *Is this the expected
behavior?* If not, can you think of any likely culprit(s) deserving
more scrutiny? I'll ping @weberjo8 <https://github.com/weberjo8> since
you seem to have been working on the pure-gauge code relatively recently.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#44>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABP6HXQJQF245OA6OIE4IFDS35V6HANCNFSM4WUNUMGQ>.
|
Hi David,
Sorry, I never looked into any details of the ORA algorithm.
I just generalized features of the anisotropic version of the pure gauge
codes and
tested against existing sample outputs and repeated some nontrivial unit
tests that
Alexei had suggested.
Best regards
Johannes
On 1/27/21 2:24 PM, Carleton DeTar wrote:
Hi David,
This is very strange, indeed. We have not used this part of the code
for some 25 years. Is the misbehavior compiler or machine dependent?
I'll take a look when I get a chance.
Best,
Carleton
On 1/26/21 6:46 PM, David Schaich wrote:
>
> I'm working with a student ***@***.*** <https://github.com/FelixSpr>)
> on a pure-gauge project, and we've become puzzled by apparently rapid
> accumulation of roundoff errors in our code
> <https://github.com/daschaich/SU4/tree/dev> based on MILC's |pg_ora|
> (over-relaxed quasi-heatbath).
>
> I have now set up a minimal example of this behavior in a new fork of
> |milc_qcd|, which you can see as commit 9985744
>
<9985744>
> of daschaich/milc_qcd
> <daschaich@99857446>. Here, after
> successfully reproducing an existing 4^3x8 test in double precision on
> a single core (commit 19b2c53
>
<19b2c53>),
> I simply change division by |z| (a |double|) to multiplication by
|1.0/z|.
>
> Despite the tiny lattice volume, after only five 'trajecs' (each with
> four over-relaxation steps and four quasi-heatbath steps) roundoff is
> visible in the printed |GMES| output, and within 20 trajecs this has
> accumulated to roughly percent-level effects in the plaquette. In
> earlier tests using our own code that show this behavior, I have
> checked that this division-to-multiplication step itself produces
> changes only at the |1e-16| level I would expect in double precision.
> I have also checked that turning off the over-relaxation steps reduces
> but doesn't completely remove this roundoff accumulation.
>
> So it looks as though there is some aspect of the |pg_ora| program
> that is causing machine-precision roundoff errors to be rapidly blown
> up by many orders of magnitude. I find this surprising, but I haven't
> done much prior work with this algorithm. *Is this the expected
> behavior?* If not, can you think of any likely culprit(s) deserving
> more scrutiny? I'll ping @weberjo8 <https://github.com/weberjo8> since
> you seem to have been working on the pure-gauge code relatively
recently.
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub
> <#44>, or unsubscribe
>
<https://github.com/notifications/unsubscribe-auth/ABP6HXQJQF245OA6OIE4IFDS35V6HANCNFSM4WUNUMGQ>.
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#44 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/APJ4MWFYB4UIYCJOYZGLRZLS4AHZZANCNFSM4WUNUMGQ>.
--
Dr.rer.nat. Johannes Heinrich Weber
AG Theoretische Teilchenphysik - Gitterfeldtheorie, Institut für Physik
IRIS Adlershof, Zum Großen Windkanal 6, D-12489 Berlin
Humboldt Universität, Berlin, Germany
[email protected]
|
Thanks Carleton and Johannes, I'm glad I'm not overlooking something immediately obvious. So far this has occurred wherever we've run and however we've compiled. But that's a short list at the moment---just on laptops and on a local cluster. By coincidence both my laptop and the cluster were using the same gcc 5.5.0 (with openmpi 1.10.7 for parallel running on the cluster). The cluster has some other compilers available, and on it I just reran the minimal example linked above using gcc 8.3.0 and intel (& intel-mpi) 2019u5. I see the same qualitative sensitivity to roundoff in all cases, though the specific numbers are different for gnu vs. intel compilers. That is, each of {gcc 5.5.0 scalar; gcc 5.5.0 + openmpi 1.10.7 on 2 cores; gcc 8.3.0 scalar} produce the same GMES output as linked above, while {intel 2019u5 scalar; intel 2019u5 + intel-mpi 2019u5 on 2 cores} produce different output that still changes for division by |
Hi David,
Divisions typically cost quite a bit more processor cycles than
multiplications. Is the effect sensitive to the optimization level?
Did you try -O3 -ffast-math, or the equivalent, for example. This isn't
a bug in the code, of course, but it sounds like it is a change worth
making.
Best,
Carleton
…On 1/27/21 8:28 AM, David Schaich wrote:
Thanks Carleton and Johannes, I'm glad I'm not overlooking something
immediately obvious.
So far this has occurred wherever we've run and however we've
compiled. But that's a short list at the moment---just on laptops and
on a local cluster.
By coincidence both my laptop and the cluster were using the same gcc
5.5.0 (with openmpi 1.10.7 for parallel running on the cluster). The
cluster has some other compilers available, and on it I just reran the
minimal example linked above using gcc 8.3.0 and intel (& intel-mpi)
2019u5.
I see the same qualitative sensitivity to roundoff in all cases,
though the specific numbers are different for gnu vs. intel compilers.
That is, each of {gcc 5.5.0 scalar; gcc 5.5.0 + openmpi 1.10.7 on 2
cores; gcc 8.3.0 scalar} produce the same GMES output as linked above,
while {intel 2019u5 scalar; intel 2019u5 + intel-mpi 2019u5 on 2
cores} produce different output that still changes for division by |z|
vs. multiplication by |1.0/z|. In commits 3900ff6
<3900ff6>
and d71df02
<d71df02>
I added the corresponding output for comparison.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#44 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABP6HXVD6V7QXXLYS7AEEVTS4AWLLANCNFSM4WUNUMGQ>.
|
Sorry for the slow response. All the tests above used I also tried turning off |
Hi David,
If you want to dig deeper, I guess you'd need to look at the assembly code.
Best,
Carleton
…On 1/31/21 9:28 AM, David Schaich wrote:
Sorry for the slow response.
All the tests above used |-O3|, partly out of habit, partly to
introduce as few changes as possible. I have now checked that the gcc
5.5.0 + openmpi 1.10.7 print exactly the same output with |-O0|
(a073b57
<a073b57>).
I also tried turning off |-DFAST| (ba7388c
<ba7388c>,
still with |-O0| and gcc 5.5.0 + openmpi 1.10.7). Although this did
change the output, sensitivity to division by |z| vs. multiplication
by |1.0/z| remains (just as happened when testing Intel compilers).
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#44 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABP6HXRS542KYR5BPIK7W4DS4WAK3ANCNFSM4WUNUMGQ>.
|
I'm working with a student (@FelixSpr) on a pure-gauge project, and we've become puzzled by apparently rapid accumulation of roundoff errors in our code based on MILC's
pg_ora
(over-relaxed quasi-heatbath).I have now set up a minimal example of this behavior in a new fork of
milc_qcd
, which you can see as commit 9985744 of daschaich/milc_qcd. Here, after successfully reproducing an existing 4^3x8 test in double precision on a single core (commit 19b2c53), I simply change division byz
(adouble
) to multiplication by1.0/z
.Despite the tiny lattice volume, after only five 'trajecs' (each with four over-relaxation steps and four quasi-heatbath steps) roundoff is visible in the printed
GMES
output, and within 20 trajecs this has accumulated to roughly percent-level effects in the plaquette. In earlier tests using our own code that show this behavior, I have checked that this division-to-multiplication step itself produces changes only at the1e-16
level I would expect in double precision. I have also checked that turning off the over-relaxation steps reduces but doesn't completely remove this roundoff accumulation.So it looks as though there is some aspect of the
pg_ora
program that is causing machine-precision roundoff errors to be rapidly blown up by many orders of magnitude. I find this surprising, but I haven't done much prior work with this algorithm. Is this the expected behavior? If not, can you think of any likely culprit(s) deserving more scrutiny? I'll ping @weberjo8 since you seem to have been working on the pure-gauge code relatively recently.The text was updated successfully, but these errors were encountered: