Running nondimensional models and round-off errors #1410

tomchor · 2021-03-01T16:12:39Z

tomchor
Mar 1, 2021
Collaborator

Has anyone investigated round-off errors in Oceananigans?

In traditional fortran models (that I know of) simulations are always run after lengths, velocities, etc, are normalized. This is of course due to round-off errors, so that even small quantities in larger simulations can be accurately calculated (for example dissipation domains that are kilometers in size).

While I suspect that that's still the recommended way of running things in Julia, I'm not 100% sure if it's as necessary in fortran. I know some more modern languages/packages can minimize this issue (for example by adding a set of number by adding the smaller ones first and larger ones last).

What is your opinion about this? Do you guys generally run your simulations after normalization or in a dimensional way?

A follow-up question: if I were to normalize my simulations, is there a preferred way to do so in Oceananigans? I noticed that there's a NonDimensionalModel (that apparently creates an IncompressibleModel properly normalized using Re, Ro and Pr), but I've never seen it used in any examples.

ali-ramadhan · 2021-03-01T16:38:58Z

ali-ramadhan
Mar 1, 2021
Maintainer

I guess our (my?) experience with round-off error has mostly been through wanting to run low-precision simulations. This was something we looked into a tiny bit early on but haven't considered lately....

We were also hoping to investigate low-precision floats and alternative representations like posits with @milankl but COVID got in the way haha. He's done a lot of work in this area (in Julia!), e.g. https://github.com/milankl/ShallowWaters.jl so might have some wisdom to share. If he's okay with it, I think he's made some suggestions over email on how to avoid round-off error so with his permission I can copy-paste his advice here (unless he wants to).

I guess running with low-precision floats require good/careful management of round-off error so if you can run well at low-precision then your round-off errors should be small?

One thing I remember finding was that round-off error in the FFT-based pressure Poisson solver led to obvious artifacts in the solution when running at Float32: #55

However, later I think we uncovered a bigger issue that the time-stepping algorithm was not enforcing incompressibility so Float32 accumulated mass/divergence much faster than Float64 and it was obvious in some simulations we were running. We added tests that make sure the divergence does not grow in time that have been running ever since: https://buildkite.com/clima/oceananigans/builds/1342#aff1c9ce-9d80-4a8a-909e-d0d51d09ca86/22-287

I think since then we've felt burnt by Float32 and usually just run with Float64 (also I guess we don't see speedups with Float32 but it does save memory).

I guess if we can run well at Float32 or even Float16 (Posit16?) then maybe we don't need to worry about round-off error at Float64? I think it would be good to revisit and see if Float32 is fine now.

Not sure I'm answering your questions though. Normalizing units isn't something I thought was necessary! Does MITgcm do this? @christophernhill

I remember using NonDimensionalModel for some simulation (maybe the stratified Couette flow validation experiment) but yeah right now looks like it's not used anywhere in this repo besides in a test or two. Maybe it's too restrictive and people want to do their own non-dimensionalization?

0 replies

glwagner · 2021-03-01T17:38:35Z

glwagner
Mar 1, 2021
Maintainer

I don't think the role that round-off errors in results has been investigated in Oceananigans specifically. It'd be a nice project to illustrate when and how things go wrong.

2 replies

tomchor Mar 1, 2021
Collaborator Author

My experience from other models is that it makes some difference when diagnosing high order terms. Those were fortran models, though. Like I said, I've heard some stuff about newer languages using some tricks to circumvent some of these round-off errors, but I'm not sure how effective they are and if Julia employs any of them.

In Oceananigans I remember when I first coded the energy dissipation kernel I compared my code (which is now in Oceanostics.jl) with the LESbrary.jl code and found something like ~1% difference between both. (Both using the same eddy diffusivity, of course.) This was just a quick comparison and it's far from a rigorous study, but to me it illustrates that in some cases it might matter depending on what you're interested in.

I see that there are more published papers now using Oceananigans. Did you guys use dimensional or nondimensional simulations when producing those results?

glwagner Mar 2, 2021
Maintainer

For my paper I run dimensional simulations. I don't diagnose high-order moments however; the paper doesn't require it. The role of round off errors and the need for non-dimensional simulations has not been investigated.

The Oceananigans code I used to produce my results is here:

https://github.com/glwagner/WaveTransmittedTurbulence.jl/tree/master/simulations

I'm not sure if there would be any difference between floating point operations written in fortran or julia. I believe you'll end up with the same machine code at the end, except maybe for things like order of operations, etc --- depending on how each code is written, and perhaps also on what compiler is used. But that might not be specific to the fortran/julia comparison, either, because there are different ways to compile fortran.

milankl · 2021-03-01T18:37:26Z

milankl
Mar 1, 2021
Collaborator

Some thoughts (@ali-ramadhan you are more than welcome to share what I've written previously, with the risk that it's outdated).

Normalisation/rescaling the equations: As floats have an approximately constant decimal precision, rescaling (you may want to scale to something different than 1, so I use the term rescaling), i.e. the multiplication of your equations with a constant scale (or parts of your equations) is only important to avoid over/underflow error. As long as all the numbers in your simulation are well within the floatmin/floatmax range, rescaling shouldn't solve precision issues that you encounter (although you may end up with different solutions as normalisation isn't bitreproducible with floats). As you may not notice underflows it's still a good idea to normalise and see whether you see a significant reduction in errors.

For ShallowWaters.jl I'm rescaling the equations as with dx in units of meters and a resolution of kilometers something like 1/dx^4=O(1e-16) is not representable with float16. I ended up writing dimensionless gradient operators and moved constants like dx into other constants like viscosity or even the timestepping. That's unfortunately not something you can do when using a model as a blackbox only by changing some parameters, but has to be baked into the dynamical core. (But I haven't encountered a disadvantage other that the code doesn't look as mathematically familiar anymore)

Order of summation: Note that this is a technique indepedent of rescaling, e.g. the harmonic sum 1+1/2+1/3+1/4+... in float16 will always converge after a few hundred elements (as it sums in order big to small), whether you scale every element or not.

function harmonic_sum(::Type{T};scale::Real=1,Nmax::Int=2000) where T
    scaleT = T(scale)
    ∑ = scaleT
    for i in 2:Nmax
        ∑old = ∑
        ∑ += scaleT/T(i)
        if ∑ == ∑old
            println((Float64(∑),i))
            break
        end
    end
end

julia> harmonic_sum(Float16,scale=1/1000)    
(0.0070953369140625, 518)

julia> harmonic_sum(Float16,scale=1)
(7.0859375, 513)

julia> harmonic_sum(Float16,scale=1000)
(7064.0, 501)

(If we used powers of 2 for rescaling it would always be exactly 513 elements). The series converges sometimes a bit earlier or a bit later as the decimal precision below a power of 2 is slightly higher than above...

In general, yes, a summation should be done small to big, and it might well be that this is the underlying issue in the pressure solve, as an iFFT(FFT(x)) is from a numerical perspective a double sum. When encountering precision problems with float32 and FFT, one could first check the size of the fourier coefficients, check that 1/s*ifft(fft(s*x)) doesn't immediately solve the problem, ... ?

I've seen in #55 that there is a difference between float32 on CPU and GPU, which is interesting. If exactly the same code is executed (meaning there's no order-of-execution difference) the two solutions should be identical. If they are not, it might be worth checking that the GPU supports subnormals. Not that I care about subnormals (I believe they are overrated for most applications and a pain in hardware or software emulators) but note that minpos(Float32) = 1f-45 (smallest subnormal) and floatmin(Float32)= 1f-38 (smallest non-subnormal), so this may indicate that you are indeed encountering substantial underflows and the situation could indeed be improved using a rescaling.

Maybe just to add: I generally believe that climate models in 16-bit arithmetic are possible. I'm not saying it's easy to write them, nor that every single computation should be in 16-bit, but my research around ShallowWaters.jl gave me the feeling that it's indeed possible (when also not the most obvious) to have a large part of model computations in low precision with only very few in high precision to avoid precision-problems in some bottleneck-computations.

ECMWF's operational forecast model will be switched to float32 at some point this year after a few years of work, and they are already investigating Legendre transforms in float16 on Fugaku.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running nondimensional models and round-off errors #1410

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Running nondimensional models and round-off errors #1410

tomchor Mar 1, 2021 Collaborator

Replies: 3 comments · 2 replies

ali-ramadhan Mar 1, 2021 Maintainer

glwagner Mar 1, 2021 Maintainer

tomchor Mar 1, 2021 Collaborator Author

glwagner Mar 2, 2021 Maintainer

milankl Mar 1, 2021 Collaborator

tomchor
Mar 1, 2021
Collaborator

Replies: 3 comments 2 replies

ali-ramadhan
Mar 1, 2021
Maintainer

glwagner
Mar 1, 2021
Maintainer

tomchor Mar 1, 2021
Collaborator Author

glwagner Mar 2, 2021
Maintainer

milankl
Mar 1, 2021
Collaborator