Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling for factored algorithms #733

Open
bruttenberg opened this issue Dec 6, 2017 · 10 comments
Open

Sampling for factored algorithms #733

bruttenberg opened this issue Dec 6, 2017 · 10 comments

Comments

@bruttenberg
Copy link
Collaborator

Working with Alex Ihler and Rina Dechter, we've come up with a new way to do sampling of continuous elements for factored algorithms. Previously we would take N samples from the prior of each continuous element, then propagate those samples though the model. Since there was no consistency of samples between elements, this would often result in factor explosion as sampled elements were combined together (e.g. a mixture of gaussians would have support k*N, where k is the number of gaussians).

We changed this procedure to always enforce that no more than N samples are propagated through the model. This means that, in the mixture cause, out of k*N samples, we squeeze it back down to N samples. We now do this by uniformly sampling from the support of the variable that needs pairing down. This is equivalent to particle BP under a uniform proposal.

We're only doing this if a continuous element is detected along the dependency path of an element. When there is a mix of continuous and discrete, we still do this, and its ok because we still retain the density factor for the discrete values (ie, if the discrete values are subsampling, they will be renormalized according the selected samples).

So, do we want to push this change back into the main branch? It seems like a much better way to do sampling than before.

@apfeffer

@gtakata
Copy link
Contributor

gtakata commented Dec 6, 2017

I think the size grows as N**k which is faster than N*k. More importantly what do you mean by "the support of the variable that needs pairing down" and how do you "squeeze it back down"?

Does the procedure only work on Gaussians or any mixture of continuous variables?

@bruttenberg
Copy link
Collaborator Author

The size of the factor grows exponentially in k, yes, I was just saying the support of the variable grows linearly in k. This (in theory) works on any mixture of any continuous and discrete variables. I have not fully tested it though.

To you question about squeezing, say you have this code:

If(Flip(0.1), Normal(0, 1), Normal(10, 1))

The way we do it now, we'd take, say 10 samples from each Normal. The support of the If statement would then be 10*2 since the probability that any two samples from each normal are the same is ~0. Now, we fix the support of the If statement to 10 samples. Since there are 20 possible values from the two normals, we sub-sample those 20 values down to 10. This process repeats itself anytime the number of samples grows beyond 10 (in this example).

@gtakata
Copy link
Contributor

gtakata commented Dec 6, 2017

Still not sure what you are doing. Are you:

  1. Sampling 10 from N(0,1) and 10 from N(10, 1), then sampling 10 from the result? or
  2. Sampling 1 from N(0,1) and 9 from N(10,1), because of the Flip?

The latter makes more intuitive sense, but I'm not certain either is correct.

@apfeffer
Copy link
Contributor

apfeffer commented Dec 6, 2017 via email

@bruttenberg
Copy link
Collaborator Author

Glenn: Let X be 10 samples from N(0,1) and Y be 10 samples from N(10,1). The support of the If is now 10 samples taken from X \union Y.

Avi: Good question on LSFI. I can't give you an answer right now, I have to look into it. For continuous elements themselves, this will have no effect as this change is only implemented at Chains.

@gtakata
Copy link
Contributor

gtakata commented Dec 6, 2017 via email

@bruttenberg
Copy link
Collaborator Author

No, you're not going to ignore the flip because you'll still make a factor for the flip, which will then weight the output samples accordingly (ie, samples that came from N(10,1) will have more weight)

@bruttenberg
Copy link
Collaborator Author

Note, I'm also adding this for Apply's as well. The point Alex made was that this way we have total control over the size of the model that is sampled. (Again, only if a continuous variable is detected)

@wkretschmer
Copy link
Contributor

In your example, does the support of the Normals change? Or, do e.g. the 10 values originally sampled from Normal(0,1) get mapped onto the 10 values of the Chain by some sort of approximation? If it's the latter, you might vaguely recall that I once implemented something like this for Gibbs sampling (see here), but I never had the chance to thoroughly test it.

@bruttenberg
Copy link
Collaborator Author

No, the support of the normals does not change. They are sampled once. The values of the Chain are subsampled from the union of the two normals, so the output of the chain is always consistent with some of the samples.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants