Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct Formula For Poisson Log-Likelihood #420

Merged
merged 3 commits into from
Dec 11, 2024

Conversation

TimothyWillard
Copy link
Contributor

Describe your changes.

Per this comment #375 (comment) corrected the formula for the poisson log-likelihood, including swapping the ymodel and ydata so it matches prior poisson likelihood. Added a unit test case to demonstrate this change is to restore current behavior.

Does this pull request make any user interface changes? If so please describe.

This change does not present any user interface changes, but instead fixes behavior in the dev branch.

Per this comment #375 (comment)
corrected the formula for the poisson log-likelihood, including swapping
the `ymodel` and `ydata` so it matches prior poisson likelihood. Added a
unit test case to demonstrate this change is to restore current
behavior.
@TimothyWillard TimothyWillard force-pushed the hotfix/correct-poisson-statistics-pmf branch from dd91111 to 8e5baa3 Compare December 9, 2024 20:32
Copy link
Collaborator

@jcblemai jcblemai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand what the test does, but the formula is correct, thanks

@jcblemai
Copy link
Collaborator

jcblemai commented Dec 9, 2024

not sure if it possible or interesting to test against scipy poison pmf.

@TimothyWillard
Copy link
Contributor Author

We could remove this restriction for the poisson log-likelihood with the formulaic approach:

if self.dist in ["pois", "nbinom"]:
model_data = model_data.astype(int)
gt_data = gt_data.astype(int)

Might be better for the next release since it is a behavior change.

@TimothyWillard TimothyWillard added gempyor Concerns the Python core. high priority High priority. quick issue Short or easy fix. next release Marks a PR as a target to include in the next release. labels Dec 9, 2024
@TimothyWillard
Copy link
Contributor Author

not sure if it possible or interesting to test against scipy poison pmf.

The test fails without the formula change so if a case had been written earlier we would have caught this behavior change.

Copy link
Member

@twallema twallema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Timothy! How does the current implementation behave when either ymodel or ydata are zero? I'm used to adding one to both ymodel and ydata because this should not change the location of the maximum probability, and only alter the confidence range minimally.

@jcblemai
Copy link
Collaborator

jcblemai commented Dec 9, 2024

see #375 (comment) @twallema , this just added one to the model output but not the data.
We (probably, I'm only loosely involved in this) want to have a real poison PMF (because for some data, zero might mean zero so infinitely bad llik) and we have this zero_to_one config flag if one wants to use this trick.

@TimothyWillard
Copy link
Contributor Author

How does the current implementation behave when either ymodel or ydata are zero?

The changes can be summarized as:

Formula scipy.stats.poisson.logpmf
ydata = 0, ymodel = 2 -2 -2
ydata = 2, ymodel = 0 -inf -inf
ydata = 0, ymodel = 0 nan 0

Added a test case with poisson log-likelihood and zero valued data with
the `zero_to_one` flag set to `True`.
@TimothyWillard TimothyWillard force-pushed the hotfix/correct-poisson-statistics-pmf branch from 8bf65b4 to 10e5d98 Compare December 10, 2024 15:42
@twallema
Copy link
Member

How does the current implementation behave when either ymodel or ydata are zero?

The changes can be summarized as:

Formula scipy.stats.poisson.logpmf
ydata = 0, ymodel = 2 -2 -2
ydata = 2, ymodel = 0 -inf -inf
ydata = 0, ymodel = 0 nan 0

Interesting Timothy! What behaviour is preferred?

@HopkinsIDD HopkinsIDD deleted a comment from jcblemai Dec 10, 2024
@twallema
Copy link
Member

How does the current implementation behave when either ymodel or ydata are zero?

The changes can be summarized as:
Formula scipy.stats.poisson.logpmf
ydata = 0, ymodel = 2 -2 -2
ydata = 2, ymodel = 0 -inf -inf
ydata = 0, ymodel = 0 nan 0

Interesting Timothy! What behaviour is preferred?

@jcblemai Sorry, I accidentally removed the whole comment below, instead of my reply. @TimothyWillard so in the ymodel=0 and ydata=0 case we would now get an error?

@jcblemai
Copy link
Collaborator

I think a NaN llik is not an error but a rejection of the proposal

@twallema
Copy link
Member

I think a NaN llik is not an error but a rejection of the proposal

Emcee throws an “likelihood function returned Nan error” which stops the operation in this case, I had this happen before.

@jcblemai
Copy link
Collaborator

Emcee throws an “likelihood function returned Nan error” which stops the operation in this case, I had this happen before.
yeah this is correct

@TimothyWillard TimothyWillard merged commit b1d8404 into dev Dec 11, 2024
3 checks passed
@TimothyWillard TimothyWillard deleted the hotfix/correct-poisson-statistics-pmf branch December 11, 2024 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gempyor Concerns the Python core. high priority High priority. next release Marks a PR as a target to include in the next release. quick issue Short or easy fix.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants