-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SimpleMultiG
sediment models will not run on GPU
#143
Comments
where is this kernel? Link to code? |
This is as it currently is in main: OceanBioME.jl/src/Boundaries/Sediments/simple_multi_G.jl Lines 159 to 234 in ac8419a
Which has another bug in it which is fixed in #138 |
Interesting. I'm surprised this won't compile. Likely you can simplify the arguments to the kernel to get it to compile. Parameter space issues don't have to do with kernel complexity directly, but rather than complexity of the input arguments (ie using One issue might be that the time-stepper is input directly. You only need the named tuple
In general, its good practice to define |
This is the fixed version that does some of those things: OceanBioME.jl/src/Boundaries/Sediments/simple_multi_G.jl Lines 175 to 246 in 8f133df
I think the adapt structure route may be the solution, e.g. at the moment we're unnecessarily passing the G$^{-1}$ tendency fields for the sediment fields, but otherwise this kernel does need the majority of the info in sediment .
|
Also, this line does not belong inside the kernel: if isnan(pₐₙₒₓ)
error("Sediment anoxia has caused model failure")
end If you want to check for NaNs, you should do that outside the kernel. But note that the |
Oh yeah that's a good idea. Perhaps that is the better solution to #144 as well |
This worked! |
Hmm actually this reduced the parameter size enough on for the test with a rectilinear grid for non hydrostatic and hydrostatic, but still fails on long/lat grid |
We're also passing a load of repeated information about the advection schemes, maybe that could be reduced |
You want to reduce parameter size to the minimum needed. If you are near the limit, your code could fail when CUDA updates for example. It will also be difficult to develop the code. You should inspect every input argument and ensure that only the minimum is loaded onto the GPU. |
We do need all of the entries I think but I think the question is if we need to pass all of |
You may need all the entries, but you may be missing adapt_structure methods for them. You probably want to pass just the tracer advection right? For hydrostatic models, the tracer and momentum schemes are stored separately anyways. |
For the advection schemes, I think we can reduce it by just passing the advection schemes for the tracers which sink. I'm not sure where else to reduce the size though. Currently, we have:
I've written adapt structures for sediment so it just has the parameters and fields, underlying_biogeochemistry which always just has parameters and the sinking velocities, advection schemes which is either a scheme or NamedTuple, tracers and both tendency sets which are NamedTuples. All of these things are required because they're used or modified, or used to work out the flux of the sinking tracers. |
Which properties have fields embedded, and can you link to the code for those |
The sediment models have OceanBioME.jl/src/Boundaries/Sediments/simple_multi_G.jl Lines 149 to 163 in 5abb421
and OceanBioME.jl/src/Boundaries/Sediments/instant_remineralization.jl Lines 76 to 82 in 5abb421
and the models have OceanBioME.jl/src/Models/AdvectedPopulations/NPZD.jl Lines 306 to 321 in 5abb421
and OceanBioME.jl/src/Models/AdvectedPopulations/LOBSTER/LOBSTER.jl Lines 402 to 433 in 5abb421
We also have this:
which I think might be redundant |
I guess some of the complexity for |
perhaps! It's conservative to propagate the Note that you also want to write your code to be robust to change in the future. When you assume that you don't need to adapt some property, you implicitly prevent someone from improving / extending your model to properties that do need adaptation. By conservatively adapting everything, you grease the wheels for scientific advancement in the future. |
If you call https://github.com/JuliaGPU/Adapt.jl/blob/df06bcb6936baa7352b8cc7bf5f08f98f2653f25/src/base.jl#L3 The basic structure of any |
That took 8 bytes off the parameter size so wasn't very successful |
Vectors worked a bit better taking 56 bytes off. For some reason forcing it to only pass one of the advection schemes (rather than an NamedTuple of them) doesn't save any |
Our problems are solved: JuliaGPU/CUDA.jl#2080 !! |
Wow, that's huge. Hopefully there isn't a catastrophic loss of performance... Are you sure the function that failed is the one we are concerned about? It's not clear from the error. |
This nested structure is often the cause for issues. You can try to flatten it, perhaps. |
Yeah hopefully, I can't find docs about the ISA change that allows this, but presumably someone down the line has tested the performance change. And yeah its this
method (there was more detail somewhere else in the full error that confirmed it to me). |
Thinking about it we never have u or v components of slip velocity so we can reduce the complexity to just named tuple. Will test. |
So near, with all of the optimisation above and removing the u and v components from sinking velocities its still 8bytes too large. |
Out of curiosity (wondering if something I said was wrong) --- does it matter what the body of the kernel is? For example you could comment everything out except perhaps something trivial. |
As a fallback solution, you might try separating these calculations into multiple kernels. It might be a good idea anyways to reorganize the code so that it's easily to toggle back and forth. For example, right now the tendency calculation for the different species are intertwined. |
It does still fail |
I could see this being a better idea. It is now working (by only giving it the tracers and tendencies it needs) but I assume the parameter size is close to the limits,is there a way for me to check? I think I would rather make restructuring how the sediment tendencies are calculated a different PR since this is getting quite long now? |
Of course, no need to solve the world in one PR. |
While the kernel for
SimpleMultiG
sediments will compile on GPU they will not run due to their complexity leading to a PTX error about parameter size being thrown. This may be fixed in the future if LLVM updates its CUDA version support (JuliaGPU/CUDA.jl#2080). We may also be able to rework the model to use less parameter space so please do get in contact if you would like to use the model on GPU.An error similar to the following will be raised:
Also, note that
InstantRemineralisation
sediment does work on GPU.Discussed in #138.
The text was updated successfully, but these errors were encountered: