-
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PDS: Adding new leaf to MST is O(n²) #22
Comments
So just to clarify, adding n nodes is costing O(n²) overall, meaning each individual add is O(n)? Versus the ideal expectation of O(n log n) overall, and O(log n) for each add? |
I think this is your issue right here: rsky/rsky-pds/src/repo/mst/mod.rs Line 584 in f6b1b26
I'm not a rustacean so forgive me if I'm misunderstanding, but if you're copying the whole MST storage then that's an O(n) operation (where n is the number of nodes currently in the tree) Edit: ah yeah I'm probably mistaken, I see |
Yes, I think that is exactly right. I appreciate you taking a look at this! |
Sorry for the late response here; I've glanced at this a few times, and nothing really sticks out to me. I don't have time this exact minute to try and dig in even further, but rather than look it over and go "nope, not immediately seeing it" and not leaving a comment like the last two or three times, figured I'd say something at least :) |
I just came across this https://github.com/domodwyer/merkle-search-tree - which is a rust MST implementation with very impressive perf numbers. It doesn't look like it was written with atproto in mind though, so I'm not sure it'd be easy to drop in, but it might be useful for reference |
@rudyfraser are you still working on this? Are you happy with the implementation of the MST - at the time of asking this, c948af2? |
I'm happy to take a look if you'd like. any guidance on what you'd like to see would be appreciated |
@afbase still need help with this. Ultimately I would expect for unit tests in rsky-pds::repo::mst::tests to perform similarly to the canonical TS implementation. Currently they take significantly longer to run. High level outcome would be to mirror the unit tests in https://github.com/bluesky-social/atproto/blob/main/packages/repo/tests/mst.test.ts (especially ones like "adds records") and pass the expected outcomes with similar performance. |
Testing, Benchmarking, (and Fuzzing)I'll prioritize this first as I can work with the implementation as-is. Canonical Implementationfrom the README.md:
rsky's MST storage is a SqlRepoReader whereas the typescript's MST storage is a ReadableBlockStore. Is there a desire to implement a ReadableBlockStore? Cloning, Reference Counting, or NeitherFor when we need to insert a Cid in a higher layer of the MST, and reset the root, my thoughts are that we might want to have some kind of The other merkle search tree implementationHowever, when referencing the domodwyer/merkle-search-tree, I don't see any reference counting, or locking on the Page (i.e. layer). There is one caveat: a Node (NodeEntry) can dereference a Page (layer) that is "less than"/"lower". The implementation upsert (add) starts from the root and then traverses down to the corresponding Page (layer) whereas the typescript and rsky implementation - i believe (please correct me if i am wrong) - it starts from the bottom bottom and traverses upward. I'm hesitant to look into the other MST too much further as that MST and the ATProto implementation might be too far apart in implementation to draw ideas from meaningfully. |
I made a branch on my fork that does some benchmarking of the add at sizes 100, 500, and 1000. It uses Criterion and here is a screenshot of the report summary on second benchmark's 1000 keys sample size: you can take a look at the others by opening up those also see my notes on the metric i made up. I think it is useful to give some idea of the sample's "tree depth" which depending on insert order, can make that I'm not super familiar with Typescript and so the canonical implementations add tests to me are not exactly clear to me, but I could dig into them more. Granted, this benchmark is very rustic (i don't know what the equivalent is for "pythonic" for rust) and doesn't conform to the canonical typscript tests nor am I sure if we want to mirror that as that sounds like creating an entire test library for canonical'ness. Some Questions
|
millipds by @DavidBuchanan314 might be far enough along now that there’s prior art to draw from there as well? |
I've been playing around with tweaking the From the README.md:
I'd like tweak the MST struct and a couple of other things to make things more reference count friendly that could work in place of cloning. This might drift the implementation a bit from the canonical Typescript implementation. but my aim would be to have the unit tests work. |
Documentation of the benchmark process would be fine but personally think it's lower priority at the moment. Most of the unit tests already exist here https://github.com/blacksky-algorithms/rsky/blob/main/rsky-pds/src/repo/mst/mod.rs#L1274 For this case moving away from the canonical implementation would be fine as long as changes are directly related to the mst, existing tests pass, and there are improvements to performance. I have more availability now and will try to revisit this as well |
See graph for sample size of 254 leafs being added.
Discovered this issue when testing adding records in bulk (f6b1b26). Should be able to add 1000 records in ms. Canonical code for the same test runs in 0.717 s, estimated 2 s
The text was updated successfully, but these errors were encountered: