Document/Unit Test `gempyor.statistics` #304

TimothyWillard · 2024-08-23T15:54:50Z

This PR:

Documents the gempyor.statistics module and its only class Statistic,
Adds unit tests for all of the methods of the Statistic class,
Regulates the exports of gempyor.statistics using the __all__ dunder, and
Correct incorrect type annotations.

This PR does not:

Change the functionality contained in this module or refactor the code contained (beyond style).

In the process of this PR come across some issues that I've noted on the main issue, GH-300.

This PR derives from the branch in GH-277 because I needed the confuse related testing utilities, so it would be much easier to review this PR after reviewing/merging GH-277.

* Wrote draft documentation for the `statistics` module in Google style guide.

* Added missing type annotations and corrected already existing ones. * Applied black formatter to the file, including manually correcting some line-length issues. * Rearranged dunder methods.

…docs Merging the 'unit-test-gempyor-parameters' branch into 'GH-300/statistics-unit-tests-docs' to get the confuse related testing utilities.

* Created initial unit testing infrastructure for the `Statistic` class from `gempyor.statistics`, starting with invalid regularization name value error. * Added default to `getattr` call to make unsupported regularization value error reachable. Should obsolete with better documentation.

* Added a test fixture to test the attributes of the `Statistic` class. * Removed unnecessary `tmp_path` pytest fixture dependency. * Improved documentation on the `Statistic` class' attributes and added a raises section for the constructor.

Added a test fixture for the result of calling `str` and `repr` on an instance of the `Statistic` class.

The return of `Statistic.llik` is actually an xarray DataArray instead of a float, but summed along the date dimension.

In particular see #300 (comment).

Added the initial unit tests for the regularization methods, `_forecast_regularize` and `_allsubpop_regularize`, of the `Statistic` class. The tests are general and do not make claims about correctness for now.

Added unit tests for the `apply_resample` method, including creating a new factory, `simple_valid_resample_factory`, which hits the "resample config present" of the code path.

Added unit tests for `apply_scale` method including a new factory that produces an input set with a 'scale' config. Fixed a bug where the scale function was not applied even if provided. This is a *breaking* change, but doesn't affect currently existing test suite, need to see if this affects any currently existing config files.

Added unit tests for the `apply_transforms` method of `Statistic`, including making a new factory that includes both resampling and scaling configuration.

Created global `all_valid_factories` that can be passed directly to the `pytest.mark.parametrize` decorator to test methods of the `Statistic` class against many configurations.

Added unit tests for the `llik` method of the `Statistic` class. Had to change factories to use RMSE by default for likelihood distribution since the poisson distribution only has integer support.

Was previously using `xarray.DataArray` for `model_data` and `gt_data` in unit testing the `Statistic` class since that is what many methods expect. It seems though the main entry to the class, `compute_logloss` takes an `xarray.DataSet` that the class splices into `xarray.DataArray`s. The unit tests now more accurately reflect this.

* Created initial unit tests on the `compute_logloss` method of `Statistic`, checking for structure but not correctness. * Updated documentation for `compute_logloss` to reflect the possible `ValueError` and the correct input types expected. * Changed internal variable of that method to a float to get a consistent float return for the second tuple entry from `compute_logloss`.

Added a test fixture that confirms the `ValueError` raised when model data and ground truth data do not have the same shapes in `Statistic.compute_logloss`.

There were entries in the mock configs, modeled on existing configs, that are not considered by the `Statistic` class at all. Removed for clarity.

…docs

The merge-base changed after approval.

…docs

TimothyWillard · 2024-10-10T19:58:03Z

I think this got lost at some point after GH-277 was merged in, but this should be ready to review. I will address these issues from GH-300 in a follow-up PR after this one (want to establish a baseline of testing for the class first):

I'm not certain what this means, maybe the Statistic.llik method is supposed to make guarantees about the subpopulation order in the outputted data?

        likelihood = xr.DataArray(likelihood, coords=gt_data.coords, dims=gt_data.dims)

        # TODO: check the order of the arguments
        return likelihood

Yes, that's on the date. Not necessarily adding a check but at least making sure the code is correct

and

s RMSE used in practice for statistics? Would there be consequences for fixing this bug?

Very good find for the bug, the python inference has been used a few time on production already, I think only with poisson llik for now. But this is important and a priority.

jcblemai

All look good to me as a new baseline. The bug you've found for the llik is fixed in emcee_batch (which contains also the most recent "statistics" code) in case.

TimothyWillard · 2024-10-11T12:37:25Z

The bug you've found for the llik is fixed in emcee_batch

Would you mind referencing that commit on the main issue so that fact doesn't get lost? Thanks!

jcblemai · 2024-10-11T13:50:57Z

The bug you've found for the llik is fixed in emcee_batch

Would you mind referencing that commit on the main issue so that fact doesn't get lost? Thanks!

isn't it already cause it contains the issue number ? I'll add a comment in case

TimothyWillard · 2024-10-11T13:53:52Z

isn't it already cause it contains the issue number ? I'll add a comment in case

Ah, you're right, I missed that.

TimothyWillard added 19 commits August 19, 2024 16:52

Draft documentation for gempyor.statistics

c2f423e

* Wrote draft documentation for the `statistics` module in Google style guide.

Applied black formatter

cc3a691

Type annotations, black formatter

867924a

* Added missing type annotations and corrected already existing ones. * Applied black formatter to the file, including manually correcting some line-length issues. * Rearranged dunder methods.

Merge unit-test-gempyor-parameters into GH-300/statistics-unit-tests-…

cc8cd79

…docs Merging the 'unit-test-gempyor-parameters' branch into 'GH-300/statistics-unit-tests-docs' to get the confuse related testing utilities.

Add Statistic attributes test fixture

b27ec52

* Added a test fixture to test the attributes of the `Statistic` class. * Removed unnecessary `tmp_path` pytest fixture dependency. * Improved documentation on the `Statistic` class' attributes and added a raises section for the constructor.

Add fixture for str and repr of Statistic

f1b559c

Added a test fixture for the result of calling `str` and `repr` on an instance of the `Statistic` class.

Corrected llik return type hint

b8891be

The return of `Statistic.llik` is actually an xarray DataArray instead of a float, but summed along the date dimension.

Move TODO comments to GH-300

239ced6

In particular see #300 (comment).

Initial tests for Statistic regularization

f5a9cb9

Added the initial unit tests for the regularization methods, `_forecast_regularize` and `_allsubpop_regularize`, of the `Statistic` class. The tests are general and do not make claims about correctness for now.

Unit tests for Statistic.apply_resample

878c539

Added unit tests for the `apply_resample` method, including creating a new factory, `simple_valid_resample_factory`, which hits the "resample config present" of the code path.

Added unit test for Statistic.apply_transforms

84f3632

Added unit tests for the `apply_transforms` method of `Statistic`, including making a new factory that includes both resampling and scaling configuration.

Consolidate valid factories into global var

a518f36

Created global `all_valid_factories` that can be passed directly to the `pytest.mark.parametrize` decorator to test methods of the `Statistic` class against many configurations.

Add unit tests for Statistic.llik

f94ba0f

Added unit tests for the `llik` method of the `Statistic` class. Had to change factories to use RMSE by default for likelihood distribution since the poisson distribution only has integer support.

Test fixture for data misshape ValueError

4ff6682

Added a test fixture that confirms the `ValueError` raised when model data and ground truth data do not have the same shapes in `Statistic.compute_logloss`.

Remove unnecessary entries from mock configs

15864c7

There were entries in the mock configs, modeled on existing configs, that are not considered by the `Statistic` class at all. Removed for clarity.

TimothyWillard requested review from pearsonca, jcblemai, saraloo and emprzy August 23, 2024 15:54

TimothyWillard linked an issue Aug 23, 2024 that may be closed by this pull request

[Bug]: gempyor.statistics Missing Documentation/Unit Tests #300

Closed

emprzy previously approved these changes Aug 23, 2024

View reviewed changes

Merge unit-test-gempyor-parameters into GH-300/statistics-unit-tests-…

f0a2a0b

…docs

TimothyWillard added 2 commits September 13, 2024 08:10

Merge unit-test-gempyor-parameters into GH-300/statistics-unit-tests-…

4dd8683

…docs

Merge main into GH-300/statistics-unit-tests-docs

68b99d2

TimothyWillard marked this pull request as ready for review October 10, 2024 19:58

TimothyWillard requested a review from emprzy October 10, 2024 19:58

jcblemai approved these changes Oct 11, 2024

View reviewed changes

Merge branch 'main' into GH-300/statistics-unit-tests-docs

42d2f9d

TimothyWillard mentioned this pull request Oct 15, 2024

Support providing multi config files #336

Merged

TimothyWillard added bug Defects or errors in the code. enhancement Request for improvement or addition of new feature(s). gempyor Concerns the Python core. docstring Relating to in-code documentation. labels Oct 15, 2024

Merge main into GH-300/statistics-unit-tests-docs

54de3bc

saraloo approved these changes Oct 18, 2024

View reviewed changes

TimothyWillard merged commit 2af7da5 into main Oct 18, 2024
1 check passed

TimothyWillard deleted the GH-300/statistics-unit-tests-docs branch October 18, 2024 15:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document/Unit Test `gempyor.statistics` #304

Document/Unit Test `gempyor.statistics` #304

TimothyWillard commented Aug 23, 2024

TimothyWillard commented Oct 10, 2024

jcblemai left a comment

TimothyWillard commented Oct 11, 2024

jcblemai commented Oct 11, 2024

TimothyWillard commented Oct 11, 2024

Document/Unit Test gempyor.statistics #304

Document/Unit Test gempyor.statistics #304

Conversation

TimothyWillard commented Aug 23, 2024

TimothyWillard commented Oct 10, 2024

jcblemai left a comment

Choose a reason for hiding this comment

TimothyWillard commented Oct 11, 2024

jcblemai commented Oct 11, 2024

TimothyWillard commented Oct 11, 2024

Document/Unit Test `gempyor.statistics` #304

Document/Unit Test `gempyor.statistics` #304