-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
APIs for creating incrementally modified nodes #320
Comments
I did some more thinking and prototyping, and realized that @warpfork's patch PR provides a much better base to start from than what I'd been thinking. In hindsight, perhaps that should have been the logical sequence of steps anyway: first a patch spec, then an update API to optimize patch application. The IPLD Here are some tweaks off of the patch PR and here's a very crude implementation in The latter PR currently uses the At some point, I have a strong hunch that there's a path to a one-time copy-on-write amend (for any number of amendments) possible here between |
|
Just to describe the space and range of options a bit: Giving responsibilities to the Encode step towards the end there is possible. However, it would have high implementation cost: it would have to be reimplemented for every codec, and that'd be really gnarly. High volumes of code to produce, maintain, and benchmark. (Being codec agnostic has a high toll for the project as a whole here -- can make a lot of work in areas like this.) On the upside, shifting responsibilities to the Encode step is (almost certainly?) the only way to reach ultimate zerocopy all the way back to original byte slices (and thus reach absolute maximum fast). So: I think we should probably design for a pitstop in the middle that gives us a codec agnostic solution, even if slightly less ultimate zerocopy. Otherwise the implementation work volume is just gonna incredible, and I'd worry might land out of practical reach. If we can figure out yet another feature-detect API that lets us make encoders and nodes that can TRY to do the superfastpath, though, while having a natural fallback to the codec agnostic logic, that'd be 🚀🚀🚀 significant victory. |
I otherwise agree with the strong hunch about the possibilities :) |
I made some comments in the PR exploring using Patch for this -- summary: something in this direction could be very cool, but I wonder if we should make a separate set of types for it just not to get things too tightly coupled. |
One other thought on where this API should attach: I think it should probably be on the Not the Not the If we're doing a large walk through some data, and trying to build a new tree with updates efficiently, in the middle of that process, we're almost entirely handling (The implementation of some functions in the It's probably possible to do this on the (I think I've myself speculated that it might feel clean to put some of these amend feature detections on the NodePrototype, previously. I think I'm updating away from that idea, based on the above reasoning. Doublechecks, of course, welcome.) |
Excellent points about I'm going to study |
|
Prototyping a different (and better, I feel) approach in my draft PR. Deprecated the notes above and added fresh ones directly in the PR. I've kept all changes isolated in the I've also only tested against There are 3 main aspects of this approach:
This feels closer to the copy-on-write ideal we've been aiming for - Currently, only "add" is implemented, and with some assumptions - the Given that |
Another advantage of accumulating updates this way is that expensive operations like recalculating hashes can be applied directly on the end result of a series of updates instead of on each update. |
Relevant PR: smrz2001#1 |
@smrz2001 : just checking in to see if this is something you're planning to drive forward. |
Hey @BigLep! Yes, I've been waiting on input on my PR for a little bit. After discussion, I re-implemented my code in a different way that's cleaner but, at the same time, more intrusive so I would understand hesitation to put that in without a lot more review. The code from the previous version would have been simpler and standalone but not super generalized. I can do either but was hoping to have someone chime in on the PR so I can pick a direction and finalize it. At this point, my preference would be to switch back to the previous approach - the only open question there was around the cleanest way to package the code. |
We need APIs for creating new nodes which include content from a previous node, along with some additions or subtractions.
The first step is figuring out what pattern we use when placing these APIs. Critical check: the API should possible to implement while using COW (copy-on-write) semantics internally.
There might be more than one form of API matching this general description (depending on if it's mostly additions; mostly subtractions; if it's insertions-with-opinions-on-ordering, etc).
I suspect that, like has been a recurring theme lately, we should both introduce the feature itself via feature detection, and then probably accompany it by helper functions which expose the desired API to end users, and then tries to detect and use the feature, or does fallbacks quietly. (This might turn out a little tricky, though, considering actual objects must be returned for assembly to continue working on, and we don't want that to have to wrap them and end up with more allocations.)
It should be possible to apply an "IPLD Patch" operation (e.g., something very comparable to RFC 6902) onto whatever APIs we come up with. (The library API can focus on single elements, rather than any of the pathing parts of the Patch declarations -- but any of the semantics about e.g. order of insertion should be alignable.)
The text was updated successfully, but these errors were encountered: