Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

19 create pytensor rvs from missing datasets #20

Merged
merged 6 commits into from
May 16, 2024

Conversation

augeorge
Copy link
Contributor

@augeorge augeorge commented May 16, 2024

adds a function (plus tests) to create a pytensor from missing data .

the inputs are:

  1. name for the random variables (RVs)
  2. dataset which includes values all model variables across all conditions. The dataframe should contain floats/ints for observed data, np.inf for unobserved data, and np.nan for variables which should be excluded from the model (i.e. exchange reactions). N rows for each experimental condition x M columns for each model variable
  3. dataframe for standard deviations - should have same shape as the above dataset
  4. dataframe for laplace parameters - values are a tuple (location, scale) for the laplace distribution, should have same shape as the above dataset

If a model variable at a particular condition was observed, then a pymc Normal distribution is created with a unique name, the observed value as the mean and the corresponding value from the input standard deviations dataframe.

If a model variable at a particular condition was not observed, then a pymc Laplace distribution is created with a unique name, and the corresponding laplace parameter values from the input laplace parameter dataframe

If a model variable at a particular condition should be excluded from calculations, then a zero pytensor is created.

The current implementation loops through each row and column and assigns the corresponding RV or zero tensor and then stacks them together.

The stacked tensor is returned at the end.

The tests cover different input data type errors, and 4 conditions:

  1. data is observed for all variables and conditions
  2. data is not observed for all variables and conditions
  3. all variables should be excluded
  4. the data contains a mixture of observations, no observations, and exclusions (realistic case)

@augeorge augeorge linked an issue May 16, 2024 that may be closed by this pull request
@augeorge
Copy link
Contributor Author

augeorge commented May 16, 2024

probably can be refactored in another PR to be cleaner and more performant

@augeorge
Copy link
Contributor Author

augeorge commented May 16, 2024

added test so the shape and dimension of the returned tensor is the same as the input data - also renamed the function to 'create_pytensor_from_data_naive' since we will probably implement a faster method later

Copy link
Collaborator

@mcnaughtonadm mcnaughtonadm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Few things to consider and will be making some refactoring issues:

  1. One test is 300 lines of code long, maybe we can reduce this using pytest fixtures, or smaller tests
  2. Nested method definitions in code are probably not the best way, but instead using "private" methods following PEPs recommendation of _method(): (wth the underscore)

@augeorge
Copy link
Contributor Author

added test to check that 'll.steady_state_pytensor' runs without any errors

@mcnaughtonadm mcnaughtonadm merged commit f46cd46 into master May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

create pytensor RVs from missing datasets
2 participants