-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Multinomial distribution #1478
base: master
Are you sure you want to change the base?
Added Multinomial distribution #1478
Conversation
@@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 | |||
### Added | |||
- Add plots for `rand_distr` distributions to documentation (#1434) | |||
- Add `PertBuilder`, fix case where mode ≅ mean (#1452) | |||
- Add `Multinomail` distribution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo here "ail" vs "ial"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution and for imagining what multi-dimensional distributions should look like.
Adding another distribution is not my top priority (before rand v0.9 is released), but some quick review points are below.
rand_distr/src/multinomial.rs
Outdated
pub struct Multinomial {} | ||
|
||
impl Multinomial { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This struct is just a stub used to construct the const/dyn variants? Then I'd prefer you just move Multinomial::new_const
→ MultinomialConst::new
etc.; it's more predictable and one less import for common usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thought was that each distribution would have one common struct to construct all its variants.
Taking #494 into account, we might end up with a lot of distribution structs. Most people would never care about the exact type returned.
Having all the structs in one common doc page would probably do the same.
rand_distr/src/multinomial.rs
Outdated
pub struct MultinomialConst<'a, const K: usize, I> { | ||
/// number of draws | ||
n: I, | ||
/// weights for the multinomial distribution | ||
weights: &'a [f64; K], | ||
/// sum of the weights | ||
sum: f64, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the major advantages of "distribution objects" is that they can be stored in structs; usage of a lifetime parameter here makes that difficult (except in the case 'static
) since Rust doesn't properly support self-referential structs (yes, I know about Ouroboros).
In other words, I think we should usually prefer copying parameters into the struct implementing the distribution (no lifetime parameter), especially in this case where we don't need alloc
. It does depend on expected usage, but in this case K
should normally be reasonably small.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the problem. In my usecase I would only sample from each distribution object once, so copying might matter, but probably not significantly so to make up for the additional lifetime.
This also allows us to take an iterator and make it more general
rand_distr/src/multinomial.rs
Outdated
pub struct MultinomialDyn<'a, I> { | ||
/// number of draws | ||
n: I, | ||
/// weights for the multinomial distribution | ||
weights: &'a [f64], | ||
/// sum of the weights | ||
sum: f64, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we just use weights: Vec<f64>
or Box<[f64]>
, though we could use tinyvec
.
Motivation
The distribution can often be useful and it's not straight forward to sample from it
Details
It is a multidimensional distribution, so there are some design considerations.
I decided to have two structs
MultinomialConst
andMultinomialDyn
for compile time known number of categories and for runtime known number of categories.There is a struct
Multinomial
with two functions which construct the distributions.The dynamic version is only available with the
alloc
feature.I am happy to receive some feedback regarding this decision. I guess this will influence how we will deal with other multidimensional distributions like the Dirichlet which only supports the const generic version at the moment.