Configurable computational precision #304

j-emberton · 2024-09-29T10:31:13Z

Until recent, pyrealm has used 64bit precision by default as is typical in scientific computing.
The new demography feature uses 32 bit precision, but the remainder of the code has not precision specifed. Type promotion in python may mean calculations default to 64bit where there are combined operations on 32 and 64bit numbers.

32 bit precision has the advantage of lower memory costs and possible computational speedups through SIMD processing on 64bit CPUs.

Having the option to set global 32 bit processing would aid porting of the code to work on Apple GPUs which do not support 64bit processing.

Describe the solution you'd like
User configurable computational precision.

The text was updated successfully, but these errors were encountered:

davidorme · 2024-09-29T20:30:33Z

Or we could ditch the np.float32 in pyrealm.demography.

However, if there are genuine advantages in terms of memory use and speed to using np.float32 then I would be interested in solving this. Over in the virtual_ecosystem project, we also have some inconsistent float casting so a solution here would aid both projects (and virtual_ecosystem is going to make heavy use of pyrealm).

j-emberton · 2024-09-30T15:51:03Z

Implementation might look something like this:

#_config.py

_global_config = {
    precision: 'float64'
    }
    
def _get_threadlocal_config():
    """Get a threadlocal **mutable** configuration. If the configuration
    does not exist, copy the default global configuration."""
    if not hasattr(_threadlocal, "global_config"):
        _threadlocal.global_config = _global_config.copy()
    return _threadlocal.global_config

def get_config():
    return _get_threadlocal_config().copy()
      
def set_config(precision = None):

    local_config = _get_local_config()

    if precision is not None:
        local_config["precision"] = precision

@contextmanager
def config_context(
    *,
    precision = None
):

    old_config = get_config()
        set_config(
            precision=precision
        )

    try:
        yield
    finally:
        set_config(**old_config)

You might then have a pyrealm utils.py with:

import get_config

def get_precision():
    precision = get_config()["precision"]
    return precision

In a module you might want to have a function that checks precision:

import get_precision

def Function():
    precision = get_precision()
    var = np.asarray([1], dtype = precision)

    return var

in a script you might want to locally redefine the default precision:

from pyrealm  import config_context
import Function

with config_context(precision = "float32"):

    array = Function()

This code allows you to have a default precision set which any function or class can access to see what precision they should be using.
You can also use with to locally change the precision to whatever you want.

This basic infrastructure could also be used to set number of cores, use of array api, or any other global feature where you might want to default from a default definition.

j-emberton · 2024-09-30T15:54:08Z

@MarionBWeinzierl @omarjamil , would be interested to hear if you have any perspectives on lower precision scientific computing and what kind of performance increases you might be able to achieve. Also, if you have any thoughts on how best to implement this type of default config with manual override system.

omarjamil · 2024-09-30T16:12:25Z

The main gains are in memory, but there could be some faster computation going on too if it is possible to cache the arrays. In terms of implementation, it will require a reasonable amount of changes to make it precision configurable. Use of context managers and some sort of precision factory will help, but you will still need to wrap function calls or modify array creation code. Unless this can be incorporated into the array API work.

MarionBWeinzierl · 2024-10-01T09:05:01Z

Might be worthwhile doing a profiling sweep and try to determine whether the bottlenecks lie in the array operations or elsewhere. We could do a manual precision switch locally, and look at the impact when profiling with a larger dataset.

davidorme · 2024-10-01T09:13:24Z

I suspect the profiling will find more serious bottlenecks, but I also wonder if the current arbitrary local use of NDArray[np.float32] in demography actually just introduces small casting costs. It's not really a well thought through decision - more an expression of a vague thought about efficiency. One which shouldn't be implemented in such a piece-meal way?

davidorme added this to ICCS development board Oct 18, 2024

davidorme mentioned this issue Oct 18, 2024

Standardise NDArray typing #334

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configurable computational precision #304

Configurable computational precision #304

j-emberton commented Sep 29, 2024

davidorme commented Sep 29, 2024

j-emberton commented Sep 30, 2024

j-emberton commented Sep 30, 2024

omarjamil commented Sep 30, 2024

MarionBWeinzierl commented Oct 1, 2024

davidorme commented Oct 1, 2024

Configurable computational precision #304

Configurable computational precision #304

Comments

j-emberton commented Sep 29, 2024

davidorme commented Sep 29, 2024

j-emberton commented Sep 30, 2024

j-emberton commented Sep 30, 2024

omarjamil commented Sep 30, 2024

MarionBWeinzierl commented Oct 1, 2024

davidorme commented Oct 1, 2024