Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weighted data matrix #12

Open
rgiordan opened this issue May 25, 2015 · 11 comments
Open

Weighted data matrix #12

rgiordan opened this issue May 25, 2015 · 11 comments

Comments

@rgiordan
Copy link
Contributor

Is there any support in GaussianMixtures for weighted rows in a data matrix? For example, if I have a dataset with many repeated observations, can I pass in a matrix of distinct points and a vector of multiplicities?

@davidavdav
Copy link
Owner

I suppose this is possible. It would probably require a fair bit of rewriting, and thinking of how to keep the interface clean.

@rgiordan
Copy link
Contributor Author

It might be a nice feature request.

In the meantime I got something working by hand using the output of gmmposterior (which was very helpful) so I'm good to go.

@eford
Copy link
Contributor

eford commented Jun 13, 2015

I'm also interested in training using weighted datapoints. Could you share your code for that? Thanks.

@davidavdav
Copy link
Owner

I would think we have to add weight support for the stats() functions in stats.jl. I haven't looked at the math yeat, but I suspect it will probably boil down to a boadcasting multiply of γ with the (normalized) weights.

We could add a parameter weights everywhere, but I wouldn't find that a particularly elegant interface. A nicer solution might be to include a possible weight vector in the Data type, since the weights really belong to the data. Is this indeed the use case, that the weights are fixed with the data points?

@eford
Copy link
Contributor

eford commented Jun 14, 2015

Yes. For my application, I'm using importance sampling, so each data point has an associated weight.

Adding a weights parameter seems like the natural way to do it to me.

If you want to group the data and weights, then I think it would be better to use a structure of arrays, rather than an array of structures. For some applications, there could be multiple sets of weights for one set of data. E.g., different weights for different choices of priors or different temperatures when using tempering/annealing. I'd propose that those applications are probably best handeled by multiple function calls. But one would want to be able to swap out the weights can be done efficiently, I don't see a problem.

Using a structure of arrays also makes it easier and more efficient to combine different pacakges/libraries.

@rgiordan
Copy link
Contributor Author

In the meantime, if you're interested, you can find my hand-rolled version with weights in
Celeste.jl/blob/master/src/PSF.jl:fit_psf_gaussians

@eford
Copy link
Contributor

eford commented Jun 23, 2015

@rgiordan, Thanks. My application was different enough that I ended up writing my own em! replacement that allows for weighted data (and also training a mixture of t-distributions, rather than Gaussians.) I've just written what I need for my project (e.g., full covar matrices, data in memory). If anyone's interested, those additions are at https://github.com/eford/GaussianMixtures.jl in src/eford_extensions.jl).

@rgiordan
Copy link
Contributor Author

I think it would be useful for GaussianMixtures.jl to expose a set method for the sigma matrix. In my opinion, that was the only fiddly part of rolling my own model.

@davidavdav
Copy link
Owner

what do you mean by expose a set? you can always store your covariances in gmm.Σ, but don't forget to store them as GaussianMixures.invcovar().

I've gone though your diff's---it seems quite a rewrite of code. Was there not a way to include weighting in existing code?

@eford
Copy link
Contributor

eford commented Jun 24, 2015

There might be a way to incorporate weighting. I tried that at first. But after struggling trying to understand how your code was working, I decided it would be easier to rewrite the training function. Feel free to add functionality in a more general way

@davidavdav
Copy link
Owner

Ah---that sounds like the code isn't very transparant, which is not great. I suppose it could do with some cleanup and rewrites here and there.

Anyway---when I add weighting to the original code, we now have independent code that can be used for verification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants