You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using coords in conjunction with Data some unexpected behavior can happen if one of the data containers is given the same name as one of the existing coords dimensions. The symptom I was diagnosing was pretty difficult to identify: when sampling posterior predictive checks, some variables in the middle of the model that used one of the coord dimensions in their dims were getting sampled for seemingly opaque reasons compared to identically constructed models in a separate notebook. Eventually realized that it was because of the conflicting namespacing, which I think led to the model thinking there was a shared dependency and unknown data leading to using the priors in the ppc. All would have been avoided if there was a simple warning when your Data container conflicts with a coord dim.
In the example below, changing the data container name to anything other than “group” is sufficient to return to normal behavior.
Reproduceable code example:
importnumpyasnpimportpymcaspmimportarvizasaz# Generate synthetic datan_per_group=100groups=np.array([0, 1, 2])
group_means=np.array([0, 10, 20])
# Generate data for each groupdata=np.concatenate([np.random.normal(loc=mean, scale=1, size=n_per_group) formeaningroup_means])
group_codes=np.concatenate([[group] *n_per_groupforgroupingroups])
# Define coordinatescoords= {"group": groups,}
withpm.Model(coords=coords) asmodel:
# Data containersgroup=pm.Data("group", group_codes, dims="obs")
y=pm.Data("y", data, dims="obs")
# Priorsmu=pm.Normal("mu", mu=0, sigma=10, dims="group") # One mu for each groupsigma=pm.HalfNormal("sigma", sigma=1)
# Likelihoody_obs=pm.Normal("y_obs", mu=mu[group], sigma=sigma, observed=y, dims="obs")
# Samplingtrace=pm.sample(1000,tune=1000,chains=1,return_inferencedata=True)
ppc=pm.sample_posterior_predictive(trace)
Error message:
No response
PyMC version information:
5.18.2
Context for the issue:
It’s a footgun. It’s natural to want to use the same name for the dimension in coords and for the data that stores the assigned values, especially when you are dynamically constructing your model (eg from data frames which may have an unknown number of columns).
a simple warning (or hard error) which is thrown when this conflict occurs would be much appreciated.
The text was updated successfully, but these errors were encountered:
Describe the issue:
When using
coords
in conjunction withData
some unexpected behavior can happen if one of the data containers is given the same name as one of the existing coords dimensions. The symptom I was diagnosing was pretty difficult to identify: when sampling posterior predictive checks, some variables in the middle of the model that used one of the coord dimensions in theirdims
were getting sampled for seemingly opaque reasons compared to identically constructed models in a separate notebook. Eventually realized that it was because of the conflicting namespacing, which I think led to the model thinking there was a shared dependency and unknown data leading to using the priors in the ppc. All would have been avoided if there was a simple warning when yourData
container conflicts with a coord dim.In the example below, changing the data container name to anything other than “group” is sufficient to return to normal behavior.
Reproduceable code example:
Error message:
No response
PyMC version information:
5.18.2
Context for the issue:
It’s a footgun. It’s natural to want to use the same name for the dimension in coords and for the data that stores the assigned values, especially when you are dynamically constructing your model (eg from data frames which may have an unknown number of columns).
a simple warning (or hard error) which is thrown when this conflict occurs would be much appreciated.
The text was updated successfully, but these errors were encountered: