Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No warning thrown when Data given same name as coords key #7589

Open
szvsw opened this issue Nov 26, 2024 · 5 comments
Open

No warning thrown when Data given same name as coords key #7589

szvsw opened this issue Nov 26, 2024 · 5 comments
Labels

Comments

@szvsw
Copy link

szvsw commented Nov 26, 2024

Describe the issue:

When using coords in conjunction with Data some unexpected behavior can happen if one of the data containers is given the same name as one of the existing coords dimensions. The symptom I was diagnosing was pretty difficult to identify: when sampling posterior predictive checks, some variables in the middle of the model that used one of the coord dimensions in their dims were getting sampled for seemingly opaque reasons compared to identically constructed models in a separate notebook. Eventually realized that it was because of the conflicting namespacing, which I think led to the model thinking there was a shared dependency and unknown data leading to using the priors in the ppc. All would have been avoided if there was a simple warning when your Data container conflicts with a coord dim.

In the example below, changing the data container name to anything other than “group” is sufficient to return to normal behavior.

Reproduceable code example:

import numpy as np
import pymc as pm
import arviz as az

# Generate synthetic data
n_per_group = 100
groups = np.array([0, 1, 2])
group_means = np.array([0, 10, 20])

# Generate data for each group
data = np.concatenate([np.random.normal(loc=mean, scale=1, size=n_per_group) for mean in group_means])
group_codes = np.concatenate([[group] * n_per_group for group in groups])

# Define coordinates
coords = {"group": groups,}

with pm.Model(coords=coords) as model:
    # Data containers
    group = pm.Data("group", group_codes, dims="obs")
    y = pm.Data("y", data, dims="obs")

    # Priors
    mu = pm.Normal("mu", mu=0, sigma=10, dims="group")  # One mu for each group
    sigma = pm.HalfNormal("sigma", sigma=1)

    # Likelihood
    y_obs = pm.Normal("y_obs", mu=mu[group], sigma=sigma, observed=y, dims="obs")

    # Sampling
    trace = pm.sample(1000,tune=1000,chains=1,return_inferencedata=True)
    ppc = pm.sample_posterior_predictive(trace)

Error message:

No response

PyMC version information:

5.18.2

Context for the issue:

It’s a footgun. It’s natural to want to use the same name for the dimension in coords and for the data that stores the assigned values, especially when you are dynamically constructing your model (eg from data frames which may have an unknown number of columns).

a simple warning (or hard error) which is thrown when this conflict occurs would be much appreciated.

@szvsw szvsw added the bug label Nov 26, 2024
Copy link

welcome bot commented Nov 26, 2024

Welcome Banner]
🎉 Welcome to PyMC! 🎉 We're really excited to have your input into the project! 💖

If you haven't done so already, please make sure you check out our Contributing Guidelines and Code of Conduct.

@ricardoV94
Copy link
Member

CC @lucianopaz does this make sense?

@aseyboldt
Copy link
Member

I would also have expected that this already throws an error.

@ricardoV94
Copy link
Member

We have a check when the variable has the same name as one of the dims, but bot another variable?

@szvsw
Copy link
Author

szvsw commented Nov 26, 2024

I haven't double checked all cases:

  1. Variable - Variable
  2. Variable - Data
  3. Variable - Dim
  4. Data - Dim (this is what I encountered)
  5. Data - Data
  6. Dim - Dim

But as a user I think they should probably all fail when a new declaration would obscure another.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants