Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add helper functions for PFT geometry and canopy shape. #290

Closed
Tracked by #321
davidorme opened this issue Sep 20, 2024 · 5 comments · Fixed by #296
Closed
Tracked by #321

Add helper functions for PFT geometry and canopy shape. #290

davidorme opened this issue Sep 20, 2024 · 5 comments · Fixed by #296

Comments

@davidorme
Copy link
Collaborator

We need some neat wrappers to make it easier to visualise PFT scaling without having to go through a Community object. This is partly to make the docs easier, but also to allow users to visualise the effects of the PFT settings. We need:

  • A geometry method or class that takes a PFT and an array of diameter at breast heights (DBH) or stem heights and generates all the other predictions of the T model, for use in plotting allometries and scaling.
  • A similar canopy method or class that takes a PFT, a stem height / DBH value and generates the vertical canopy data for plotting.
@davidorme
Copy link
Collaborator Author

OK - been playing around with idea on this, and I think a nice API for users on this issue is to extend Flora:

class Flora():
    ...
    def get_allometries(dbh: NDArray) -> dict[str, NDArray]:
        """Generate T Model predictions for PFTs in Flora for a set of DBH values."""
        ...
    def get_canopy_shape(z: NDArray) -> dict[str, NDArray]:
        """Generate canopy shape predictions for PFTs in Flora for a set of vertical height values."""
        ...

Those aren't methods of PlantFunctionalType as described above but immediately makes predictions across a set of PFTs that a user is working with, which seems like 90% of the use cases. If someone genuinely does only want one PFT, then creating a single PFT flora is pretty low cost.

If the return values in the dictionary are a set of (n_dbh, n_pfts) arrays for different predictions then it is also a very clean plotting structure for matplotlib. A usage sketch would be:

import numpy as np
from matplotlib import pyplot as plt
from pyrealm.demography.flora import PlantFunctionalType, Flora

# Create flora
pft1 = PlantFunctionalType(name='pft1', h_max=20)
pft2 = PlantFunctionalType(name='pft2', h_max=30)
flora = Flora([pft1, pft2])

# Get allometries for DBH values from 0 to 1 m in 1cm increments
dbh = np.linspace(0, 1, num=101) 
allometries = flora.get_allometries(dbh=dbh)

# Plot stem_height vs dbh for both PFTs in one go.
plt(dbh, allometries['stem_height'], label = flora.pft_names)

I like that a lot as an API but it immediately provides another angle on why pandas.DataFrame is the wrong choice for Flora.data and pd.Series is unhelpful for the arguments in the t_model_functions.py functions . If we have the PFT data as numpy arrays and the T model functions as pure numpy, then the get_allometry method should be able to turn dbh into a column vector and calculate across all PFTs by broadcasting traits as row vectors and size values as column vectors. For example, this works and results in a (100, 2) array of stem heights.

def calculate_heights(h_max: NDArray, a_hd: NDArray, dbh: NDArray) -> NDArray:
   
    return h_max * (1 - np.exp(-a_hd * dbh / h_max))

dbh = np.linspace(0, 1, num=101)
h_max = np.array([20,30])
a_hd = np.array([116.0,116.0])

h = calculate_heights(h_max, a_hd, dbh[:, None]) # turn DBH into a column vector.

I think that restructure would be make the functions more flexible and general. I was going to concentrate on expanding functionality through this issue and #289 before thinking about replacing pandas, but I think that API sketch and lot of the canopy code would be far cleaner if I fixed the internal data structures first.

@omarjamil, @MarionBWeinzierl and @j-emberton Thoughts?

@j-emberton
Copy link
Collaborator

OK - been playing around with idea on this, and I think a nice API for users on this issue is to extend Flora:

class Flora():
    ...
    def get_allometries(dbh: NDArray) -> dict[str, NDArray]:
        """Generate T Model predictions for PFTs in Flora for a set of DBH values."""
        ...
    def get_canopy_shape(z: NDArray) -> dict[str, NDArray]:
        """Generate canopy shape predictions for PFTs in Flora for a set of vertical height values."""
        ...

Those aren't methods of PlantFunctionalType as described above but immediately makes predictions across a set of PFTs that a user is working with, which seems like 90% of the use cases. If someone genuinely does only want one PFT, then creating a single PFT flora is pretty low cost.

If the return values in the dictionary are a set of (n_dbh, n_pfts) arrays for different predictions then it is also a very clean plotting structure for matplotlib. A usage sketch would be:

import numpy as np
from matplotlib import pyplot as plt
from pyrealm.demography.flora import PlantFunctionalType, Flora

# Create flora
pft1 = PlantFunctionalType(name='pft1', h_max=20)
pft2 = PlantFunctionalType(name='pft2', h_max=30)
flora = Flora([pft1, pft2])

# Get allometries for DBH values from 0 to 1 m in 1cm increments
dbh = np.linspace(0, 1, num=101) 
allometries = flora.get_allometries(dbh=dbh)

# Plot stem_height vs dbh for both PFTs in one go.
plt(dbh, allometries['stem_height'], label = flora.pft_names)

I like that a lot as an API but it immediately provides another angle on why pandas.DataFrame is the wrong choice for Flora.data and pd.Series is unhelpful for the arguments in the t_model_functions.py functions . If we have the PFT data as numpy arrays and the T model functions as pure numpy, then the get_allometry method should be able to turn dbh into a column vector and calculate across all PFTs by broadcasting traits as row vectors and size values as column vectors. For example, this works and results in a (100, 2) array of stem heights.

def calculate_heights(h_max: NDArray, a_hd: NDArray, dbh: NDArray) -> NDArray:
   
    return h_max * (1 - np.exp(-a_hd * dbh / h_max))

dbh = np.linspace(0, 1, num=101)
h_max = np.array([20,30])
a_hd = np.array([116.0,116.0])

h = calculate_heights(h_max, a_hd, dbh[:, None]) # turn DBH into a column vector.

I think that restructure would be make the functions more flexible and general. I was going to concentrate on expanding functionality through this issue and #289 before thinking about replacing pandas, but I think that API sketch and lot of the canopy code would be far cleaner if I fixed the internal data structures first.

@omarjamil, @MarionBWeinzierl and @j-emberton Thoughts?

I like this approach, but I would change the implementation slightly so that the end user doesn't need to worry about transposing the array.

I would prefer that we handle the transposition inside (perhaps using a generic shape test and conditional transposition helper function) to keep the interface cleaner. Alongside, for this type of broadcasting we might want to enforce checks at runtime to make sure the arrays are broadcastable.

Its slightly messy to try and type hint the appropriate relative shapes for all the inputs, so easier just to do it at runtime.

But this is secondary to the overall approach which I think is nice.

@davidorme
Copy link
Collaborator Author

davidorme commented Sep 21, 2024

I like this approach, but I would change the implementation slightly so that the end user doesn't need to worry about transposing the array.

I would prefer that we handle the transposition inside (perhaps using a generic shape test and conditional transposition helper function) to keep the interface cleaner. Alongside, for this type of broadcasting we might want to enforce checks at runtime to make sure the arrays are broadcastable.

I'd agree completely except for the use case where someone wants e.g. a 6x6 output with 6 heights for each of 6 PFTs in a Flora. If the broadcasting is internal, then we can't tell the user intention from the input shapes alone, so we would have to add an intended output shape argument (out_2d: bool = False) to every function. That's not a huge deal, but it feels cleaner to use the shapes themselves to show intention and clearly document the usage?

And I don't think I'd bother type hinting the shapes - or at least, we've not done it anywhere else, so handling shape checking in a validation function like the ones in #288 seems ok to me.

@omarjamil
Copy link
Collaborator

This seems like a reasonable approach to me as well. On a broader point, I agree that we should leave the transposition to the user. As long the functions are consistent in how the input arrays are arranged and what is returned, the rest should be left to the user. This would be the easiest to understand and debug.

@davidorme
Copy link
Collaborator Author

davidorme commented Sep 23, 2024

Thanks @omarjamil and @j-emberton . So my plan now is to:

  1. Remove the use of pandas for what is now in develop (Replace use of pandas in pyrealm.demography #292)
  2. Update the in progress PR for the Canopy model (Implementation of the canopy model #288 ) to match
  3. Then come back here and implement the sketch above.

@davidorme davidorme mentioned this issue Sep 25, 2024
10 tasks
@davidorme davidorme linked a pull request Sep 25, 2024 that will close this issue
10 tasks
@github-project-automation github-project-automation bot moved this from Priorities and low hanging fruit to Done in ICCS development board Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants