Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert as_random_distribution To Use scipy.stats Distributions #290

Open
TimothyWillard opened this issue Aug 8, 2024 · 2 comments
Open
Labels
enhancement Request for improvement or addition of new feature(s). gempyor Concerns the Python core. medium priority Medium priority.

Comments

@TimothyWillard
Copy link
Contributor

Is your feature request related to a problem? Please describe.

The gempyor.utils.as_random_distribution/random_distribution_sampler functions use a mix of np.random rngs and scipy.stats rngs. This makes testing these and functions that rely on these functions more convoluted then it should be as demonstrated by gempyor.testing. sample_fits_distribution.

See #277 (comment), #277 (comment).

Is your feature request related to a new application, scenario round, pathogen? Please describe.

The application is mostly for internal use in gempyor, it is unlikely that these functions will be used much externally but would assist in efforts to unit test and stabilize gempyor.

Describe the solution you'd like

Consolidate the logic in gempyor.utils.as_random_distribution/random_distribution_sampler and gempyor.testing. sample_fits_distribution together and to use distributions from scipy.stats instead of from np.random.

@TimothyWillard TimothyWillard added enhancement Request for improvement or addition of new feature(s). gempyor Concerns the Python core. medium priority Medium priority. labels Aug 8, 2024
@TimothyWillard
Copy link
Contributor Author

Related to GH-300, but would be nice to also consolidate the custom distribution handling in the llik method of gempyor.statistics.Statistics into this solution as well. This would allow for (more) standardized naming + parameterizations of distributions used throughout flepiMoP's config files. The relevant lines of code in question are:

        dist_map = {
            "pois": scipy.stats.poisson.logpmf,
            "norm": lambda x, loc, scale: scipy.stats.norm.logpdf(
                x, loc=loc, scale=self.params.get("scale", scale)
            ),  # wrong:
            "norm_cov": lambda x, loc, scale: scipy.stats.norm.logpdf(
                x, loc=loc, scale=scale * loc.where(loc > 5, 5)
            ),  # TODO: check, that it's really the loc
            "nbinom": lambda x, n, p: scipy.stats.nbinom.logpmf(
                x, n=self.params.get("n"), p=model_data
            ),
            "rmse": lambda x, y: -np.log(np.nansum(np.sqrt((x - y) ** 2))),
            "absolute_error": lambda x, y: -np.log(np.nansum(np.abs(x - y))),
        }
        if self.dist not in dist_map:
            raise ValueError(f"Invalid distribution specified: {self.dist}")
        if self.dist in ["pois", "nbinom"]:
            model_data = model_data.astype(int)
            gt_data = gt_data.astype(int)

Not really sure what to do about the MSE/MAE options, those aren't likelihoods they're losses so they don't correspond to an underlying probability distribution.

@TimothyWillard
Copy link
Contributor Author

Moving this here from #300 (comment).

I think this is referring to some sort of parameterization issue, but I'm not sure. Probably can be add as a task to #290.
"norm_cov": lambda x, loc, scale: scipy.stats.norm.logpdf(
x, loc=loc, scale=scale * loc.where(loc > 5, 5)
), # TODO: check, that it's really the loc

This is to mimic a likelihood function which I really don't like but is our most used at the moment and in the R inference. Basically, the sd scales with the number. I would remove that but I need it to benchmark the new python inference vs the old r inference. We could also keep it making the 5 a parameter. ((ooh sorry re-reading I saw the comment. Yes, potential parametrization issue).

@TimothyWillard TimothyWillard added this to the Software Quality milestone Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for improvement or addition of new feature(s). gempyor Concerns the Python core. medium priority Medium priority.
Projects
None yet
Development

No branches or pull requests

1 participant