-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: remove dependency on rand
ecosystem
#241
Comments
rand
ecosystemrand
ecosystem
One thing you could do is have your own Edit: that actually sounds like a good thin crate. and if it's explicitly for non-cryptographic uses then it should be pretty easy |
@elichai For the purposes of the quickcheck crate---and honestly, probably many other simple uses---something like the Also, |
I'm writing a thin crate now that just gives a replacement to the Rng/RngCore traits. Hope to publish a first release soon and if you like it you can use it :) |
FYI, ThreadLocal is also a secure Rng. https://lemire.me/blog/2019/03/19/the-fastest-conventional-random-number-generator-that-can-pass-big-crush |
Can you say what your long term maintenance plan is?
I'm not sure I grok this sentence. Could you elaborate? |
About About my plans, My hope is to write it in a way that requires little maintenance as the whole point is to keep the code very thin. You can see my current work here: https://github.com/elichai/random-rs |
That's
Thanks! So I just want to be crystal clear: my standards for bringing in another crate for this are going to be very high, in particular because it will likely be a public dependency of See here for some words I've written about how I evaluate dependencies. |
Right, sorry. confused the names. As I said i'm planing to release today, would be appreciated if you could look/play with it a bit and tell me your thoughts, but for actually depending as you said in your post waiting a bit is good advice, as we're all humans and stuff tend to come up that I might need to fix. but hopefully that will be a very short period. I'll reiterate, my plan is to have a stable API that doesn't require any changes except adding support for new primitives in the future. |
Sorry if this comes across as somewhat bitter; I'll try not to be. So many people have expectations of what a random number crate should be; the
For what it's worth, I have re-opened rust-random/rand#850. Opinions differ about what it's worth supporting, but perhaps I see some value in this now. I won't tell people use rand or don't use rand, but if people wish to pitch in with their views on the project, we will try to listen. Whether it is a good fit for |
See the rand book for a summary of RNGs we supply — there are both small, fast PRNGs and crypto-RNGs. Some are very easy to re-implement; e.g. |
This is a very good point, and it doesn't help that to use Rand's So again, I don't know what is your best option here. |
@dhardy Thanks! I appreciate your response. The minimal version check is probably my most pressing concern. I have looked into using only rand_core in quickcheck before, but couldn't see how to do it (although I don't remember why). |
Yeah, it definitely isn't. These are hard trade offs to balance. I think the most common complaints I hear about Churn is hard to fix, because it requires settling on an API that is fixed for a potentially long period of time. I don't know what
Right. This is a tough one to balance too. I personally think the ecosystem has swung too far in the direction of micro-crates, but that's just my opinion. And even if everybody agreed with that opinion, the path to fixing it is not clear. But as an example, there probably exists a design in which Of course, I am only representing the benefits. There are of course benefits to splitting things across crates, and in particular, there are downsides to using Cargo features. So most of what I'm saying here is an opinion based on my own sensibilities.
I definitely have a very strong desire to depend on the random crate that everyone else uses. That is a huge benefit that can't really be mitigated by depending on something that is less commonly used. Therefore, I'd like to impress upon you that I did not make this issue lightly, and I only did it after a large amount of frustration on my end began to bubble up and boil over. |
@BurntSushi thanks for your response!
There isn't actually a lot more planned. There are some open issues regarding API tweaks, but it may be better to minimise churn in these cases. The big issue is really stabilising the
Compile time isn't even a big motivation; as I understand it it's more about having access to the APIs while having less dependency code to review. The current design may have been influenced too much by crypto-nerds. I half wonder if it would be for the best to re-assimilate
If your only public API dependencies are on the |
Here's another point on the topic from a quant :) What many people don't realize is that stacking multiple uniform RNGs for a bunch of independent variables does not yield a good coverage of a high-dimensional space. This is a very well-known fact e.g. in math finance when performing Monte-Carlo simulations; to achieve good coverage in a sparse space, you would typically use a low-discrepancy sequence. Even in 2-D space, discrepancy is already visible; as dimensionality increases, things get progressively worse. For instance, you have a tuple Just a (quasi-)random thought - given the purpose of this crate (to cover as much of the search space as possible leaving no gaps), would it make sense to consider using QRngs such as Sobol sequence? As for other distributions like normal, afaik it is possible to make a (quasi-)normal qrng out of a uniform qrng using Box-Muller transform. In reality, would people use anything more sophisticated than uniform / normal in this crate? Here's how it works), note it's 2-D where things are not that bad. /* removed the image so as not to clutter this thread */ |
@aldanor Thanks for the insight, but I don't think that's relevant to this specific issue? Maybe you could open a new issue? This issue isn't about switching rngs, but rather, considering which crates to use. Also, if you're looking for changes to be made, it would be helpful to leverage your expertise and explain in more simpler terms the changes that would result by switching to a different rng. Also, note that quickcheck does not strictly use a uniform distribution. It specifically also tries to pick out problem values for specific types. |
My point was that QRngs don't require an Rng at all :) (hence a comment in this thread) As in, they are essentially deterministic and behave better in higher-dimensional spaces (at least the floating-point ones) when you have tons of parameters. If this topic would be of interest of anyone, I can open a separate issue and list a few thoughts there. |
@aldanor Yes, a separate issue please. The details of how quickcheck generates values is way off topic for this thread I think. |
As probably one of the main advocates of the current micro-crate approach employed by
For me it's a very clean and logical separation, which allows incremental stabilization of some I would like to argue that most of the observed churn is not caused directly by the micro-crate design. I think the "explosion" of |
@aldanor has a point — the optimal distribution of values for Thanks @newpavlov for summarising the status of Rand crates. Personally I am very happy that |
@newpavlov Thanks for the thoughts. This is why this problem is hard: reasonable people can disagree. The design you laid out is perfectly defensible. But there are other designs, hinted at by @dhardy. For example, I would definitely agree with you that
Right. The size of a dependency tree is typically just a signal. But I'd like to re-iterate my point above about cohesion. I don't think we can really evaluate crate hierarchies in a technical vacuum. We also need to consider the actual interaction the folks have with crates. This includes everything from trying to understand the aggregate APIs provided by those crates to managing expectations when they see a much larger number of crates added to their tree than they would otherwise expect. I'm in the process of writing a blog post about dependency selection that will hopefully try to explain my thoughts more clearly, assuming I get it done. It isn't just about |
Well, in my opinion it's just sweeping complexity under the rug. Increasing number of features also makes it harder to comprehensively test a crate, since number of possible feature combinations raises exponentially. And instead of a clean dependency graph you get a potentially messy grey-box. This is why I believe that instead of introducing a ton of features to One argument with which I agree is that micro-crate design (BTW I think "micro" is a bit too strong in our case) makes life of linux package maintainers more difficult, but I think it this case a better approach would be to develop a solution for packaging a Rust application into a single package with all its upstream dependencies listed in its I think it's somewhat funny how number of crates became a main complexity metric in such discussions. Instead of, for example, total LoC or number of groups which maintain your dependencies. I guess the main reason for that is because it's the only metric which you always observe when compiling your project, so people tend to dramatically overestimate its importance. (although there are certain issues with cargo, which make situation a bit worse than it should be, e.g. like downloading unused optional dependencies) |
Yes, those are some of the downsides. I mentioned that Cargo features had downsides, so I wasn't trying to sweep them under the rug.
It's not just distros. It's also folks that need to review dependencies, i.e., something like
Sounds like a non-starter to me? Some distros, like Archlinux, do this. But it's not going to fly for Debian, as far as I understand things.
Yes, I mostly agree. But it doesn't seem funny to me, I guess. It makes perfect sense. As the maintainer of Rust applications, I try to keep my dependency trees under control. Whenever I run A layer of abstraction over crates (like "maintenance groups") is perhaps a good idea. Certainly, in some cases, a maintenance group more closely approximates the maintenance burden assumed by relying on dependencies. But I'm pretty sure that will require significant tooling to pull off correctly, nevermind the social work required to do it. I'm not much of a visionary, and I don't have a lot of time to burn on this stuff, so I'm more or less looking for ways of making things better today, using the tools we have. I can't spend too much of my time on what the "ideal" scenario is divorced from the feasibility of it. I am on the receiving end of this too. I very often hear from folks that they don't want to use
Yes, it would be very nice to fix this. |
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand
See BurntSushi/quickcheck#241 : tempfile depends on an antiquated rand Closes: #717 Approved by: mimoo
@BurntSushi do you think this thread has served its purpose by now? I don't care much for the Libra project, and minimising dependencies is potentially a laudable goal in its own right, but that others are claiming rand is "antiquated" and using this as evidence is rather odd. My recommendations:
|
Yes, I don't understand the commentary from the libra people. Although, they are just doing what I've already done in a few places: ripping out I'm not sure I've quite decided what to do. But I haven't thought about this too much lately. When I circle back around to this, I'll update this ticket with a decision and close it out. |
Couple of thoughts here (admittedly, I've spend only about 5 minutes on catching up the the thread & design):
So, it seems to me that we can provide the following simplified API: pub trait Arbitrary: Clone + Send + 'static {
fn arbitrary(seed: &mut Seed) -> Self;
fn shrink(&self) -> Box<dyn Iterator<Item = Self>> {
empty_shrinker()
}
}
pub struct Seed {
rng: fastrand::Rng,
size: usize,
}
impl Seed {
fn rng(&mut self) -> &mut fastrand::Rng { &mut self.rng }
fn gen<T: Arbitrary>(&mut self) -> T { T::arbitrary(self) }
fn size(&self) -> usize { self.size }
} |
@matklad That looks plausible. I think the main issue is that it is probably occasionally useful to have access to the rng within an But otherwise, yes, I like the idea. |
Yeah, that's why my strawman has
Yeah, the naming is half-backed. The idea behind Additional minor observation: it's nice that |
Hm, re-reading this, I feel that there might be some misunderstanding. |
So looking at the API, I think all we'd probably want is pub trait Arbitrary: Clone + Send + 'static {
fn arbitrary(gen: &mut Gen) -> Self;
fn shrink(&self) -> Box<dyn Iterator<Item = Self>> {
empty_shrinker()
}
}
pub struct Gen { // was 'Seed' in your original comment
rng: fastrand::Rng,
size: usize,
}
impl Gen {
pub fn gen<T: Arbitrary>(&mut self) -> T { T::arbitrary(self) }
pub fn size(&self) -> usize { self.size }
pub fn shuffle<T>(&self, slice: &mut [T]) { self.rng.shuffle(slice); }
} And I think that should do it? Many of the I chose
I think we're on the same page? What I meant was, |
That's clever and that's the bit I didn't understand! It makes the API surface smaller, and hides public dependency! And yeah, I agree that
|
Aye. And if we need to bring in more methods from |
Well, |
And it looks like the |
This removes the use of the rand_core crate as a public dependency. It is now an implementation detail. We achieve this primarily by turning the `Gen` trait into a concrete type and fixing the fallout. This does make it impossible for callers to use their own `Gen` implementations, but it's unclear how often this was being used (if at all). This does also limit the number of RNG utility routines that callers have easy access to. However, it should be possible to use rand's `SeedableRng::from_{rng,seed}` routines to get access to more general RNGs. Closes #241
This removes the use of the rand_core crate as a public dependency. It is now an implementation detail. We achieve this primarily by turning the `Gen` trait into a concrete type and fixing the fallout. This does make it impossible for callers to use their own `Gen` implementations, but it's unclear how often this was being used (if at all). This does also limit the number of RNG utility routines that callers have easy access to. However, it should be possible to use rand's `SeedableRng::from_{rng,seed}` routines to get access to more general RNGs. Closes #241
Pre quickcheck v1.0 rand and quickcheck had to be updated in lock-step, given that the latter makes use of traits of the former. This commit decouples the two, only depending on quickcheck directly for randomness. This will ease the transisition to quickcheck 1.0, see BurntSushi/quickcheck#241.
I am no longer happy about depending on the
rand
crates. There is too much churn, too many crates, and IMO, worst of all, there is no desire to add a minimal version check to their CI. Which means anything that depends onquickcheck
in turn cannot reliably have its own minimal version check.Because I am tired of depending on
rand
, I have started removing it completely where possible. For example, inwalkdir
, I've removed quickcheck as a dependency. In ripgrep, I've removedtempfile
as a dependency, because it in turn was the only thing bringingrand
into ripgrep's dependency tree.I don't see any other path forward here. I can either continue to grin and bear
rand
, drop everything that depends on randomness, or figure out how to generate randomness withoutrand
. Specifically, I'd very much like to add a minimal version check back to theregex
crate, which catches bugs that happen in practice. (See here and here.) My sense is that there is some design space in the ecosystem for a simple source of randomness that doesn't need to be cryptographically secure, and an API that does not experience significant churn. Certainly, quickcheck does not need a cryptographic random number generator.With that said, there is some infrastructure in the
rand
API that is incredibly useful. For example, quickcheck makes heavy use of theRng::gen
method for generating values based on type.So it seems like if we have something like the
Rng
trait with with a non-cryptographic RNG, then we'd be probably good to go.Are there other avenues here? What have I missed? My experience in building infrastructure for randomness is pretty limited, so am I underestimating the difficulty involved here?
Another side to this question is whether any users of quickcheck are leveraging parts of the
rand
ecosystem that would be difficult or impossible to do if we broke ties withrand
.The text was updated successfully, but these errors were encountered: