Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable computational precision #304

Open
j-emberton opened this issue Sep 29, 2024 · 6 comments
Open

Configurable computational precision #304

j-emberton opened this issue Sep 29, 2024 · 6 comments

Comments

@j-emberton
Copy link
Collaborator

Until recent, pyrealm has used 64bit precision by default as is typical in scientific computing.
The new demography feature uses 32 bit precision, but the remainder of the code has not precision specifed. Type promotion in python may mean calculations default to 64bit where there are combined operations on 32 and 64bit numbers.

32 bit precision has the advantage of lower memory costs and possible computational speedups through SIMD processing on 64bit CPUs.

Having the option to set global 32 bit processing would aid porting of the code to work on Apple GPUs which do not support 64bit processing.

Describe the solution you'd like
User configurable computational precision.

@davidorme
Copy link
Collaborator

Or we could ditch the np.float32 in pyrealm.demography.

However, if there are genuine advantages in terms of memory use and speed to using np.float32 then I would be interested in solving this. Over in the virtual_ecosystem project, we also have some inconsistent float casting so a solution here would aid both projects (and virtual_ecosystem is going to make heavy use of pyrealm).

@j-emberton
Copy link
Collaborator Author

Implementation might look something like this:

#_config.py

_global_config = {
    precision: 'float64'
    }
    
def _get_threadlocal_config():
    """Get a threadlocal **mutable** configuration. If the configuration
    does not exist, copy the default global configuration."""
    if not hasattr(_threadlocal, "global_config"):
        _threadlocal.global_config = _global_config.copy()
    return _threadlocal.global_config

def get_config():
    return _get_threadlocal_config().copy()
      
def set_config(precision = None):

    local_config = _get_local_config()

    if precision is not None:
        local_config["precision"] = precision

@contextmanager
def config_context(
    *,
    precision = None
):

    old_config = get_config()
        set_config(
            precision=precision
        )

    try:
        yield
    finally:
        set_config(**old_config)
    

You might then have a pyrealm utils.py with:

import get_config

def get_precision():
    precision = get_config()["precision"]
    return precision

In a module you might want to have a function that checks precision:

import get_precision

def Function():
    precision = get_precision()
    var = np.asarray([1], dtype = precision)

    return var

in a script you might want to locally redefine the default precision:

from pyrealm  import config_context
import Function

with config_context(precision = "float32"):

    array = Function()

This code allows you to have a default precision set which any function or class can access to see what precision they should be using.
You can also use with to locally change the precision to whatever you want.

This basic infrastructure could also be used to set number of cores, use of array api, or any other global feature where you might want to default from a default definition.

@j-emberton
Copy link
Collaborator Author

@MarionBWeinzierl @omarjamil , would be interested to hear if you have any perspectives on lower precision scientific computing and what kind of performance increases you might be able to achieve. Also, if you have any thoughts on how best to implement this type of default config with manual override system.

@omarjamil
Copy link
Collaborator

The main gains are in memory, but there could be some faster computation going on too if it is possible to cache the arrays. In terms of implementation, it will require a reasonable amount of changes to make it precision configurable. Use of context managers and some sort of precision factory will help, but you will still need to wrap function calls or modify array creation code. Unless this can be incorporated into the array API work.

@MarionBWeinzierl
Copy link
Collaborator

Might be worthwhile doing a profiling sweep and try to determine whether the bottlenecks lie in the array operations or elsewhere. We could do a manual precision switch locally, and look at the impact when profiling with a larger dataset.

@davidorme
Copy link
Collaborator

I suspect the profiling will find more serious bottlenecks, but I also wonder if the current arbitrary local use of NDArray[np.float32] in demography actually just introduces small casting costs. It's not really a well thought through decision - more an expression of a vague thought about efficiency. One which shouldn't be implemented in such a piece-meal way?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Icebox
Development

No branches or pull requests

4 participants