-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reduce memory usage of nucleation model implementation in raccoon #150
Comments
Please take a look at it when you have time @permcody . Thanks. |
I am checking with my derivative size setting. The old 70GB per cpu case were done with |
That's a bit surprising to me. For both cases (150 and 300), did you configure MOOSE and then compile your application? |
yes. |
You have |
thanks for the advice. But I am not sure if I understand it. So the goal is to protect possible division by zero coming from |
The issue is in the derivative of sqrt(ADReal(0)), which is ~ 1/sqrt(0). It may be that a similar issue is found in other parts of the code, not necessarily yours. You could run the model through the debugger and find out what's triggering that MetaPhysicL error. |
It is weird. On the duke cluster, opt hit the above error but dbg runs fine. How should I learn from the different behaviors? |
That behavior seems a bit odd to me... @lindsayad |
I have already applied a treatment to object that will be used in |
I would run your input with valgrind to make sure there are no uninitialized values |
I will post the input and mesh very soon. It is not the example listed at the beginning of this issue. |
You should do that, not me 😄 |
I'm optimistic about valgrind telling us something useful. |
oops, sorry i misunderstood. |
I think this is the valgrind msg related to uninitialized value(s). It was printed before the moose executable printed the ad derivative size error.
How should I look for the cause of this uninitialized value? Is it an AD variable on the boundary? |
Judging by the back trace it seems the issue is coming from a parsed material: Can you double check your input? Maybe uninitialized values, as Alex pointed out? Or divide by zero,... |
@dschwen do you think this is a false positive in the JIT code? |
I am trying with constant material properties or linear material properties to see if that clear the issue. Can you explain what is an uninitialized values in the input deck? I thought for all quantities in the input deck, when we create them in the input deck, the initial value must be provided to complete the definition. |
I thought so too. Just suggested that you double check in case you see an issue. |
Can you do that valgrind check with a dbg executable? JIT compilation keeps the function sources in that case and we could check exactly what's going on here. |
but running in dbg executable does not trigger the error. |
That suggests that there is some kind of non-deterministic error. Valgrind will catch this if that's the case regardless of the method you run with. Also how do you know that is the assertion you're triggering? I thought that you were just getting a general MetaPhysciL exception, the cause of which was unknown? |
Sorry i missed this part of the error msg. I only posted the part after
Is it normal that libmesh was compile on Jun 18? I updated mamba this Monday. |
I thought the issues was an uninitialized access ... |
@BoZeng1997 what method were you running with when you got the valgrind error? |
I got invalid read error msg when running opt executable with valgrind. I am not sure if that means uninitialized values.
opt only. |
Well the next thing I would try is gdb with ‘catch throw’ and see what you can learn when the metaphysicl exception is thrown. It would be good to get a stack trace |
This is what I can get with gbd+opt.
opt on cluster with ad derivative size 150 runs after I cleaned the folder |
Oh I forgot about this ... if you change your derivative size configuration there are problems with the |
Two nucleation models for phase-field fracture are memory consuming. Either in how the material object is coded, or how the model is implemented in input deck level (or both).
source code
https://github.com/BoZeng1997/raccoon/blob/c24df81ba4ef97f1b3490821daa631d961e3e68d/src/materials/KLRNucleationMicroForce.C
https://github.com/BoZeng1997/raccoon/blob/c24df81ba4ef97f1b3490821daa631d961e3e68d/include/materials/KLRNucleationMicroForce.h
how the model is implemented
https://github.com/BoZeng1997/raccoon/tree/c24df81ba4ef97f1b3490821daa631d961e3e68d/tutorials/surfing_boundary_problem
The current implementation is for sure not the best way. It requires dispx dispy dispz to be transfered to the subapp. Then the subapp would compute stress tensor invariant I1 and J2. One way to improve it a little bit is by computing I1 and J2 in the mainapp then transfer it to subapp. I am waiting to see if there is even better way of improvement.
The text was updated successfully, but these errors were encountered: