Replies: 2 comments 1 reply
-
Note: neighboring average slow cells data race? |
Beta Was this translation helpful? Give feedback.
0 replies
-
For the PRNG: could we just have it print out the seed it used? Then at least we can rerun that exact simulation if we want. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I intend this post to be mostly educative and wiki-like.
I am not a very precise person--for now when I say reproducibility and determinism I really mean three questions:
"can we run Cholla twice on the same machine with the same binary on the same inputs and get the same outputs."
"can we run Cholla from a restart and get the same outputs as if we ran Cholla without a restart interrupt"
"can we run Cholla with a different number of ranks and get the same output"
Any failure of the three, in my opinion, indicates one of these:
The latter two are a bit of a spectrum, as some design choices to avoid sacrificing one of the three may be so costly that it would be deemed impossible, hence unavoidable.
For now, I document the three major sources of indeterminism I am aware of:
atomicAdd.
Not all atomicAdds cause this, but for any kernel with multiple atomicAdds to the same address, the atomicAdds may occur in indeterminate order, which creates indeterminism because floating point operations are not associative (order matters).
PRNG
Although possible to massage PRNGs to be fully deterministic, in many cases they have not been, because it can be costly in terms of developer time to do this massage.
Read_Grid
Some physics modules may need to save additional state for a restart to be fully consistent with a continuous run.
Beta Was this translation helpful? Give feedback.
All reactions