-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
relaxed reification #27
base: main
Are you sure you want to change the base?
Conversation
If a node is not an immediate pbnode, but can be built as such, then reifiy appropriately. This allows a dag-pb node to have been serialized to cbor or json, and still be interpreted as unixfs data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whacky, but neat
what's the use-case for this, and the car convert
method? I'm intrigued!
I needed both of these for ipld/ipld#184 - if i want to represent blocks of data in dag-json format for specs, but still demonstrate a partial ADL'd traversal over them i needed |
// see if the node has the right structure anyway | ||
pbb := dagpb.Type.PBNode.NewBuilder() | ||
if err := pbb.AssignNode(maybePBNodeRoot); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willscott this will this change the behavior of Reify such that it's no longer spec compliant? i.e. UnixFSv1 is only defined over DagPB nodes, will this allow me to create DagCBOR objects that will render as long as they are convertable into DagPB (i.e. they contain the fields in https://ipld.io/specs/codecs/dag-pb/spec/#serial-format)?
If so, you may want to define some LooseReify
or something that can be used for testing and some conversion work but is called out as non spec-compliant.
I may have misinterpreted how this works though and if so please correct me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is used in a selector where the requester explicitly signals interpretAs: unixfs
. If the data at the point they signal fits the data structure of unixfs (e.g. a data field in a struct that can itself be unmarshaled with dagpb) that seems like a pretty reasonable thing to do, rather than block on the concrete implementation.
Note that this isn't DagPB
the serialization, but rather DagPB
the concrete schema types. I could have valid dagpb data, but if it's been re-built into a basicnode
wrapper, e.g. due to a WalkTransforming
or other ipld internal process, you would also have this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so this effectively is a UnixFS spec change. It means that any applications that rely on UnixFS behavior, and this package, (e.g. ipfs cat
, ipfs.io/ipfs/bafyfoobar
, ...) will have their behavior changed to be more relaxed.
I don't think modifying a spec implicitly for the purposes of testing is a particularly good idea. IIUC we ran into this problem last year where originally /ipfs/bafydagcbor/myunixfsfield
was not supposed to resolve but did due to lax implementation which as a result of users building structures like this has effectively forced us to be stuck with this and cause problems for our UnixFS upgrade path.
Would it be problematic to just define a new ADL "UnixFSLoose" and we can use that without requiring a UnixFS spec change (which you could propose, but might take some time to accumulate and integrate feedback).
It seems like maintaining the UnixFS spec within the go-ipld-prime context and ADLs might start to get more complicated since UnixFS is only defined over DagPB.
Should we be considering a UnixFS spec change that redefines them in terms of either ADL "interfaces" or data model structures going forward here? Mostly I'd like us to not make spec changes in code without ecosystem involvement of the various people that depend on the behavior of UnixFS (i.e. a huge portion of the IPFS ecosystem).
I was thinking this might be a bit complicated and so a new ADL would allow us to side-step this.
cc @mvdan @rvagg @ribasushi in case you have any thoughts or advice here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the spec you link does not specify that the outer structure that the unixfs protobuf is encoded within (e.g a struct with a 'data' and 'links' field) needs to be serialized using the dag pb codec. If i took that same data and happened to transfer it over the wire in a cbor codec, would it stop being unixfs data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@willscott it's a little obscure but the spec does mandate pb-in-pb:
https://github.com/ipfs/specs/blame/master/UNIXFS.md#L81
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even that can be interpreted with some flexibility: it specifies that the outer container of the data follows the merkeldag protocol buffer structure of 'Data' and 'Links', but does not express that it must be encoded using the 'dagpb' codec vs other codecs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This data is serialized and placed inside the 'Data' field of the outer merkledag protobuf, which also contains the actual links to the child nodes of this object.
"Placed inside of the outer merkledag protobuf" seems pretty clear that it's a protobuf, not to mention all of the existing tooling that insists it's a protobuf.
Basically, I keep seeing perceived ambiguity in our specs cause implementation issues (dag-cbor in /ipfs, Bitswap protobuf ambiguity, /api/v0 on gateways, etc.). I am strongly opposed to changing specs without changing the specs, especially when a written spec already exists.
If we want to change the specs then let's open a spec PR and figure it out (changing specs is fine, but doing so in some Go repo is IMO not), otherwise I've been told that multiple ADLs are supposed to be allowed to exist so why not just make a new one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
➕ on above - no spec changes without spec changes please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can continue in ipfs/specs#271
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like @aschmahmann has requested changes.
My first response is somewhat similar -- while this may not be be breaking the DagPB spec cause it's not a serialization, I still feel like it's a change to UnixFSv1 itself to accept non-protobuf formats. https://github.com/ipfs/specs/blob/master/UNIXFS.md
I can't decide whether I care, though if as this comment says, it's all for the spec tests, well, it seems like a not great reason to introduce potentially unexpected behavior, and further evidence that the spec tests are non-ideal to work with.
Anyway, net net, I think I'm fine either way?
@aschmahmann is my response sufficient to address your concerns? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very enthusiastic about this, and it seems directionally correct to me.
There's no doubt in my mind that if we re-did unixfsv1, today, with everything we know now, it would be fully codec agnostic -- like this code is now becoming. The discussion about the history and what the unixfs specs say is true and also valid, but I'm definitely in favor of expanding those definitions. As this code proves, it is both possible and well defined, and also in fact easy to do.
It may or may not be worth adding explicit conditional branches in downstream tools that want to be strict about dag-pb codecs for some reason. (I don't have a very strong opinion about this. I don't immediately think of any reasons I would want to do that, as a downstream tool maintainer, but I certainly haven't thought about it long enough to rule anything out, either.) This seems to me like something worth discussing in this repo's changelog, but then leaving to downstream choice.
The utility of this for sheer testing and fixtures and communicatability when used with e.g. dag-json, as already discussed in earlier comments, is just impeccable. 🤩
@warpfork please see ipfs/specs#271 (comment) |
Given the unwillingness to change unixfs at a spec level, are there alternative approaches we can take here to support the kind of testing that this is attempting to enable? |
A couple I can think of at the moment are:
|
Can I poke a bit more at the notion that this is an implicit spec change, because I'm not really sure it is:
Further to that last point and re other paths to
I think the only place this matters is here: https://github.com/ipfs/go-ipfs/blob/67fdb6efcdd1d4c0c0ee6fef7190b92c13184dbd/core/node/core.go#L102 Which is where we could be strict about protobuf-on-protobuf, it could just become: unixFSFetcher := ipldFetcher.WithReifier(func(lc linking.LinkContext, n ipld.Node, ls *linking.LinkSystem) (ipld.Node, error) {
_, ok := n.(dagpb.PBNode)
if ok {
return unixfsnode.Reify(lc, n, ls)
}
return n, nil
}) Then we'd have the best of both worlds - this library would remain a lens and not be tied to codecs and we get to do all of the novel things that lenses should be able to do, but we'd also get protobuf-on-protobuf where it matters. This library remains our most mature ADL and it would be nice to push it forward a bit more because it's the best example we have and currently our best path to maturing the concept of ADLs .. even if UnixFS comes with considerable baggage. |
Entirely separate thought that might trigger some folks ... we could make a bunch of UnixFS layering mess go away if we just reinterpret dag-pb to include UnixFS decoding where it exists. So dag-pb is still |
This was the exact reason I pushed back on the change originally. There is no standalone UnixFS spec, it is directly tied to the underlying/outer PB. Aside from the LinkName being significant in UnixFs-Type-1, this part of the inner PB refer to a very strictly (albeit implicitly) defined portion of the outer dag-pb: https://github.com/ipfs/go-unixfs/blob/master/pb/unixfs.proto#L18 I am 💯 supportive rewriting the UnixFS spec to reflect what is in fact in the wild. |
In my experience with IPLD thus far, every time we say "popular system X is loose, but you do you" it inevitably comes up a little while later as "can we change the spec to reflect reality and say that X is allowed to be loose". For example: ipld/ipld#196. Given that go-unixfs is strict and is the largest producer and consumer of this data it seems like we're intentionally setting ourselves up for a spec change by both advocating for the use of this library over go-unixfs and making it looser.
As long as it's clear to people which they should use this seems fine. However, saying that go-ipfs follows the UnixFS spec but we recommend people use selectors and the UnixFS ADL which does not... seems bad.
I thought a bit about this and while it could be nice there are two issues I see:
For example: go-unixfs has https://github.com/ipfs/go-unixfs/blob/5cd23312773df8e0503fffe8238a163a58168f71/file/unixfile.go#L155 whereas this repo has Line 15 in 62f7ce2
Line 10 in 62f7ce2
|
Well, to get to the core of my main point - I'd really like this library to have a very limited scope of concerns, as an ADL, which, if possible would mean being agnostic to the codec layer. There's multiple points of difficulty with that, but the one that concerns us here is on the read side, where this library is essentially handling Nodes, pass it a Node, get a Node-interface thing back that does fancier things than the original Node did. Pushing codec concerns to an outer layer would be ideal to try and achieve that, some options:
|
The strict coupling poses some problems for IPLD implementations that: 1. do not retain codec information between the serialization and UnixFS (ADL) reification; and/or 2. do not have a mechanism to strictly mandate that UnixFS data be _only_ encoded in a particular codec. A mandate in the specification that strictly defines the layering on top of the codec makes it difficult to implement it as an ADL, which also presents difficulty for using tooling that builds on IPLD lens-style layering. Ref: ipfs/go-unixfsnode#27 Ref: #271 Ref: ipld/go-car#304
I've pushed an update to this which adds tests and also a After considering ipld/go-car#304 and some of the discussion in ipfs/specs#271 I'm less convinced this is even helpful. It does retain a status-quo option though and we can switch go-ipfs to use I think we either need to merge ipfs/specs#294 to slightly loosen the spec a bit (my preferred option at the moment), or we're going to need to add some awkward wiring into go-ipld-prime that can tell us what the codec was when decoding the data or build some other adjacent mechanism that can retain this (which will still likely require some go-ipld-prime changes to deal with an external conditional that intercepts the reification). |
If a node is not an immediate pbnode, but can be built as such, then reifiy appropriately.
This allows a dag-pb node to have been serialized to cbor or json, and still be interpreted as unixfs data.