-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Content addressable contexts #9
Comments
I think the question to be answered for this issue is what features are needed, and what affordances in the syntax are then required to make those features possible. All of the suggested patterns in the original issue work out of the box, as the context IRI is not required to be HTTP/S.
Sub-resource integrity is an interesting proposition, as would metadata for the context to declare it "frozen", which seems related to #20. |
From the previous bug, I mentioned magnet URIs and @gkellogg said:
Honestly I think magnet URIs do half of what we want in a general and widely used way and can be used with a variety of p2p networks. Maybe it's time to actually standardize them? |
I wonder whether this is for us to deal with. If the value of The only one so far that seems to be different is the separate "@context" : {
"@id" : "https://...",
"@fetch-attribute" : {
"integrity" : "sha384-Li9vy3...",
"crossorigin": "anonymous",
"....."
}
} My worry is to cast in concrete keywords reflecting the state of today and not be future proof enough in an area that is fast evolving. (Caveat: I am not an expert in all these various fetch attributes...) |
so a) I don't think this is really something that needs to be standardized in the json-ld spec |
Fully agree. But should be sure that existing standards can be used within JSON-LD. In this sense, it may be more than just best practice (see again the way integrity could be handled). |
This issue was discussed in a meeting.
View the transcriptRob Sanderson: the issue from my POV is that it would be nice to have a feature that let’s context be “sealed” or “frozen”… this lets people identify an immutable context … this could prevent a huge amount of people retrieving context over the wire Dan Brickley: cf https://www.w3.org/RDF/Group/Schema/openissues.html#c12 “Closed 19980707 - deferred until 1.1; M+S to supply a mandatory extensions mechanism” (“We would like to define a mechanism for ‘sealing’ an RDF Class, so that it becomes illegal to make certain RDF statements involving it. This is loosly analagous to the notion of ‘final’ classes in Java / OO programming. …”) Rob Sanderson: as well as preventing people from overriding term expansion … since choosing an expansion is “last in wins” Ivan Herman and Gregg Kellogg: Those are two different issues! Rob Sanderson: #9 is the first part of what I was describing … immutability Benjamin Young: Immutable header https://tools.ietf.org/html/rfc8246 Rob Sanderson: I believe there is no argument about this being a good functionality—the question is just, how do we do this in practice? … is this an HTTP thing? Ivan Herman: We are overloading the term “frozen”. Dan Brickley: Similar metaphors, but not the same. Benjamin Young: There is an identifier-based approach, and for schema.org the semantics have changed … over time Dan Brickley: well, it’s always meant schema.org, but we have resolved it in different ways, because of the history … we have just kept adding terms, and the context just keeps growing Gregg Kellogg: You might be able to pare it down with @vocab Dan Brickley: It’s rarely been the case that we don’t have to touch it for a long period of time … and there’s constantly things to fix Benjamin Young: That’s the POV that this identifier will always get you the most up-to-date, “best” version of the resource … but for the way I work, which uses separate context documents for different versions, would behave completely differently … so one proposal is to bake version into the identifier, and another is to seal the context Dan Brickley: Not too worried about that part of the problem: It’s not inconceivable that we might maintain a couple of high-value vocabs, but much of schema.org isn’t used now … and for the sealed context question, another approach– perhaps allow to specify import order and thereby prevent inappropriate overrides Gregg Kellogg: looking at the mechanism by which a processor can be encouraged to not constantly be fetching context, … . HTTP has cache-control, and we suggest using it, but that doesn’t seems to be enough. … so can we add some of the HTTP semantics right inside the context? … e.g. “this is valid for this time, do not refetch within that time” … processing tools could even throw on an inappropriate retrieval … we’d also like to be able to “preload” contexts (prefill a cache). Dan Brickley: We are reliving the XML Catalog discussions from couple decades back, https://en.wikipedia.org/wiki/XML_catalog Ivan Herman: I am afraid of conneg and HTTP headers, which are great, BUT many users cannot control them … relying on that is attractive to a spec, but may not be a good solution … so gkellogg’s suggest is uglier than relying on HTTP headers, but way more secure … the schema.org example shows us that many context cannot be sealed. … the “freezing” (no bad overrides) things—I’m guilty of that myself … we do that on top of schema.org in publishing Dan Brickley: We’re fine with that! Ivan Herman: sealing a popular context could create problems for that Dan Brickley: Sure, we can do conneg. But it’s frustrating as a vocab provider. For the rest of the site we are moving to pure static hosting. … we would like our work as vocab providers to end with creating files Leonard Rosenthol: Other vocabulary providers have a different approach. In our defns and vocabs, we have a policy that says … once this is stable, any future version must be backwards-compatible … can’t remove fields, can’t change value types, etc. … in part because we didn’t want to deal with versioning … but we also maintain a registry, so at any time anyone can go and get a schema … so we get some of the benefits (e.g. we don’t have to do unneeded fetches) … we don’t have to worry about the versioning because of the policy Dan Brickley: two things where I’ve been raising concerns around @context … when you’re a search engine, you may not have all the resources in hand at once … also, @context doesn’t work for IoT, where devices are resource-constrained … they don’t want to be fetching stuff across the network … not sure whether “frozen” or “sealed” is the right metaphor … but the assumption that fetching a @context is trivial isn’t necessarirly a good one … 1.1 should not force many fetches, e.g. because of privacy concerns Ivan Herman: understood, but how do we avoid the problem? Dan Brickley: We shouldn’t allow the difficulty of the problem to prevent us from facing it. Rob Sanderson: We’ll be talking to the IoT folks this afternoon. Leonard Rosenthol: Why (other than parsing) do you need to fetch contexts? others: Can’t interpret the data at all w/o the context Leonard Rosenthol: Doesn’t mean that you have to fetch it, does it? Dan Brickley: We ran into this with RDF/XML—we forced you to put all the semantics into the instance. Made the syntax much disliked. Benjamin Young: https://json-ld.org/spec/latest/json-ld-api-best-practices/#cache-context Dan Brickley: JSON-LD went the other way, It isn’t even the semantics (in terms of the rdfs/owl model), it is very basic graph structure issues. Does “foo”:”bar” expand to “bar” or http://..../bar URI Benjamin Young: There is to be a best-practice document, which can discuss these issues. … the docs may say you can fetch all the time, but the tools understand how bad an idea that is. Benjamin Young: I manually solve this problem, and it’s sort of like what we would want … but we haven’t encouraged people to create workflows that are safe in that way … users don’t necessarily even know that they should know HTTP at the depth needed for control caching Luc Audrain: in Publishing, we are using JSON-LD to build metadata about books, and push it to webpages. We have, today, standards like ONIX, which are in use all over the world. But we know we have qualifications on our metadata. E.g. we might send a link to a cover image. When the book goes on sale, we might change the image to include the new price. We assume people refetch to get the updated image. It’s a similar problem. It would be nice if we could provide this. Rob Sanderson: That’s at a different level, the level of instances. They will always require refetching. That’s not quite within our work. paul: Can someone define the term “sealed”? … who makes that decision? Rob Sanderson: the publisher would decide that. … that’s the feature that prevents overrides. … for the other feature (content-addressable contexts)… … to the extent that we want to include HTTP-type features to control/prevent fetching … What is the extent to which we should go, to make that easy for devs? Dan Brickley: In RDFS, W3C Working Draft 14 August 1998 https://www.w3.org/TR/1998/WD-rdf-schema-19980814/ we wrote “”"”Since an RDF Schema URI unambiguously identifies a single version of a schema, RDF processors (and Web caches) should be able to safely store copies of RDF schema graphs for an indefinite period. The problems of RDF schema evolution share many characteristics with XML DTD version management and the general problem of Web resource versioning.” Dan Brickley: “Is is expected that a general approach to these issues will presented in a future version of this document, in co-ordination with other W3C activities. Future versions of this document may also offer RDF specific guidelines: for example, describing how a schema could document its relationship to preceding versions.”””” Ivan Herman: there is an analogy that might inspire us … it’s like the way that CSS handles fonts … they have an indirection. I define a font symbol, to which I can attach URIs or filenames … the processor picks the first that is available. … . so e.g. I could perhaps make a list, with my file of schema.org first, then the network address. … . that’s what CSS does—they have the same problem (in their case huge font files) … how could we shoehorn this into the syntax? Leonard Rosenthol: You’d have to change the meaning of @context Gregg Kellogg: The fact that there is a practice in CSS is good. … piggybacking on an existing pattern in CSS will make arch review easier Dan Brickley: it may be worth talking to the CSS folks to see how that feature is working out in practice Rob Sanderson: Seems like something we could put into the API spec. … if cache management was part of the API … as opposed to reimplementing another document loader Ivan Herman: -> https://www.w3.org/TR/css-fonts-3/#font-face-rule Font face rule in CSS Leonard Rosenthol: The problem w/ putting it into the API is that you’ve got to think of the various devices on which it might run– what about embedding devices, e.g.? … the API must be flexible enough to let clients control the amount of caching that takes place. Simon Steyskal: https://drafts.csswg.org/css-font-loading-3/ ? Gregg Kellogg: The API spec does permit that—we only spec the relevant behavior Ivan Herman: There are some nice examples from the CSS doc linked above Dan Brickley: Is anyone here tracking the WebPacakging work? … seems to address the problem of leaking URLs … i.e. queries from Google … it’s also about what happens when you don’t retrieve from the resource’s identifier, but from somewhere else, which is the same problem … different timescales to cultural heritage, bu the same underlying problem! Ivan Herman: Yes, but IIUC not something to change in the spec Benjamin Young: the thing that is “hiding” in the WebPackage is that you don’t get the content from the content owner. … there is no way to discover the webpackage from the live URL Dan Brickley: There are many ways to get content. How you get content is a different question from JSON-LD– we don’t have to solve it. … we might look at this in terms of parser APIs. How do parsers report the timeliness of their operations? … we might want to say that JSON-LD doesn’t mandate the only parser APIs—to create some space in which to expand/eplore … I want the schema.org contexts to be available into the future… Harold Solbrig: The ShEx spec states that to be valid JSON ShEx, the context must have a specific URI … that muddles the identity of the operation with the identity of a resource involved therewith … what do we do with contexts that may never resolves? Gregg Kellogg: For ShEx on the web, we established a media type. Then we gave rules for that media type. Rob Sanderson: Seems like there’s a general feeling that there are many possible solutions to this issue, and baking something into the specs might be shortsighted. Dan Brickley: the more I think about it, the more I like the idea of the poor parser just reporting what it did, weaknesses and all Ivan Herman: We could almost verbatim copy what CSS did ** Gregg gkellogg** and Leonard Rosenthol: but that would break extant parsers Ivan Herman: No, we could add more syntax for this purpose, not change @context Dan Brickley: Who uses this Ivan Herman: Whoever publishes data. Dan Brickley: That’s tens of millions of sites, thousands of which will be hacked…. Benjamin Young: it’s different from fonts—you have fallbacks in that case, and @context is more central to JSON-LD than fonts are to CSS — BREAK — Benjamin Young: it would be good to have a decision around the approach … cache loading into the instance (eg. font-face), http-level info (in the context doc) Gregg Kellogg: there is some precedent on http changes Rob Sanderson: or we can change things at the API level (since we have doc loading there) … so that library writers bear the cost Gregg Kellogg: one thing we could do is change the doc loader about how it looks for the context - eg. prescribe the specifics - as well as specific support for “side lookups” … that would lead to greater consistency … doc loading is in the specs Benjamin Young: API spec area about Document Loading https://w3c.github.io/json-ld-api/#remote-document-and-context-retrieval Rob Sanderson: can we promote (make more obvious) the doc loader in the API doc? … perhaps make it a top level item? Gregg Kellogg: this comes back to best practice. Algos described in API. … our practice has been to put a thin veneer over the APIs but didn’t do that for doc loader. But we could … there is a non-normative features section that we could also use. (eg. “do it this way”) … we could describe that sources of contexts may require excessing loading (etc.) … what does WOT do? Rob Sanderson: don’t forget privacy/security Gregg Kellogg: there are things we expect them to do (from a syntax perspective), and they won’t read the API doc Benjamin Young: the section “remote doc…” could easily be renamed. (and I like everything y’all were saying) … also want to make sure we highlight the security./privacy stuff (not just about “remote”) … also a bunch of work in best practices to explain this. (and nothing in syntax) Adam Soroka: do we want to talk about other situations where we retrieve docs? or is it just context? Gregg Kellogg: there is also a section on expansion which talks about remote IRIs and documents. Adam Soroka: so maybe we should not just put this in terms of context Gregg Kellogg: but you might not want the entire JSON-LD doc reloaded each time either, so cache-control would be useful there too. (does this belong in the body) Benjamin Young: my data shouldn’t be that smart (about caching) Rob Sanderson: the data carries ontology and not necessary processing … to ivan’s point about fallbacks. Today there is a callback (which shouldn’t need to change) and the list of things to retrieve (which we would) … unless there is a way for the doc loader to get a list of URLs to work from (in order of precedence) Ivan Herman: I produce my data an want to publish it, and I have a local context file and will point to it first and then a remote version. Gregg Kellogg: context-path/reference, it could interact with the doc loader Ivan Herman: but we need a syntax too Adam Soroka: do the things got to an existing doc loader or do they find a doc loader? … the doc loader is smarter Gregg Kellogg: an array of strings is fairly simple to understand Adam Soroka: but what if there are other policies that control the order the array is processed Gregg Kellogg: we still have a year+ to figure otu details Benjamin Young: we need a proposal to work from Leonard Rosenthol: azaroth : question for ivan Benjamin Young: in what situations would the data provider want to provide the “context set:” rather than the processor? Ivan Herman: if I set up a WoT in my home, I have to manage them. I might include a local copy of schema.org there to avoid them all having to reach outside the network. Gregg Kellogg: but that’s an http proxy Ivan Herman: but it might also fall into caching model Gregg Kellogg: webpackaging and activity streams could also help to improve caching Adam Soroka: even a refrigerator might have a filesystem where I know what the URLs on it are, so I can point to those as local cached copies Rob Sanderson: 15m done. … there is agreement that we want to promote the doc loader in the API spec into something more visible and point to it from other docs (esp. about privacy/security) … we have discussed the idea of “context list” (ordered lists of contexts) where the loader can try to find things from these multiple items … but we need requirements and use case to help understand what to specify and where to put it Adam Soroka: also be sure to include the horizontal bits too Rob Sanderson: do we need a resolution? Ivan Herman: just leave it open Benjamin Young: can we take an action? Gregg Kellogg: lets use some issue(s) to track ideas Action #2: Rob Sanderson to create issues for ContextList and DocumentLoader editorial changes |
This issue was discussed in a meeting.
View the transcriptRob Sanderson: equally or more cross-working-group to the last one… VC et al have security concerns around contexts that must be led by a server before the server can understand the JSON-LD Benjamin Young: you don’t have to fetch this every time you need it, you will cache contexts, etc. … which comes around to how contexts (as web resources) change with time … naming on the Web—easy? … each of the communities using the tech can resolve these things. we need the TAG to advice on the nuances of things … like content hashes, etc. Rob Sanderson: you can solve it at 3 diff layers: HTTP, where you can last-modified, etags, etc. … in the doc itself, which can declare versioning … or in the URI, e.g. http/example.com/v2 Dan Brickley: but do you need a context at all? Gregg Kellogg: you can put it into the instance doc itself, but it still exists. Benjamin Young: {"@context": {"@vocab": "http://example.com/"}, "name": "made up vocab"} Gregg Kellogg: The notion is that JSON-LD provides the context within which to interpret your JSON … strings sometimes are dates sometimes IRIs, sometimes something else … you need to both distinguish between these things but also allow for idiomatic JSON Benjamin Young: which creates _:doc-id <http://example.com/name> "made up vocab" (in triples—where “doc-id” is completely random) Gregg Kellogg: the context explains some of the things that API docs might explain Dan Brickley: I had thought that the context could be derived in some other way Benjamin Young: you can. Adam Soroka: (from an HTTP header) Dan Brickley: is this wrong then? Dan Brickley: { "@vocab": "https://schema.org/", "@type": "Volcano", "sameAs": "https://www.wikidata.org/wiki/Q2586153", "name": "Zuidwal volcano", "description": "The Zuidwal volcano is an extinct volcano in the Netherlands at more than 2 km (6,600 ft) below ground ..."} Benjamin Young: latest syntax spec https://w3c.github.io/json-ld-syntax/ Rob Sanderson: the TAG issue for guidance is — this caching question is much broader that JSON-LD. … many systems do this kind of “apply one resource to another to clarify” … e.g a CSS stylesheet—if it changes over time the rendered HTML will look very different … in a knowledge graph context, that kind of change is much more dangerous … guidance for mitigating those concerns in the spec rather than leaving it up to broader discussion about change over time on the web Benjamin Young: Link header looks like: Link: ; rel="http://www.w3.org/ns/json-ld#context"; type="application/ld+json" Hadley Beeman: so remote change can affect local things—how is this different from old concerns about linking? Dan Brickley: the WoT folks chose a syntax that requires this additional context resource—that’s what causes this to be different … as long as you need the context doc to interpret and act on the data, you have opened this security hole Hadley Beeman: so what about caching? Adam Soroka: WoT operates on much longer timescales — years in many cases. Dan Brickley: one thing we care about is being a planet-scale search engine, the other is about lightbulbs in your home Benjamin Young: you can’t change what happens in the lightbulb without changing the context used for data there … you can update when you update the code … the URI of the context really is a URI (an identifier), not a location, but it ends up being used as one Hadley Beeman: I still hear this either being “we have to solve the problems of building the IoT” or “the network knows something about me because of what I looked at” … which are both larger problems Dan Brickley: no, it’s different — because now the lightbulb is not a user agent, but it is broadcasting info about you Benjamin Young: our API spec says you can implement document loading any way you want, yay! … but that doesn’t really solve the problem … we have this elsewhere, e.g. clickjacking … the answer is Single Domain policies, which is utterly inflexible and unwebby … we want to have ids that can be safely resolved … we would like a webby trust model Hadley Beeman: if we do that, we say this is no longer a user agent (the light bulb) … you are making an explicit decision that has privacy remifications—you need to be aware of that Dan Brickley: (mitigation sketches being: parser reports, and web packaging for out-of-band context bundles with integrity checks, e.g. via homehubs etc) Dan Brickley: I am unaware of any context that is useful and has gone unchanged for more than a few weeks … we’re stuck between refreshing and getting the best new stuff … and broadcasting your interests on the web … but if not, you get out of date … one thing would be for parsers to report what they’re done Rob Sanderson: (and we carry on) Rob Sanderson: (everyone disagrees about the weeks lifecycle, e.g. activitystreams, annotations, ldp, etc) Dan Brickley: there are a lot of use cases for web packaging, this might connect with that … your home env might have gotten this through some kind of bundled thing at lower frequencies. Benjamin Young: you could have an API for updating the lightbulb … but that can’t work from a pull side, because you can’t guarantee that the identifier will resolve the same way on the open web over time … bringing us to blockchains, hashing, etc. … there are lots of potential solutions on the list … web packaging is one potential solution Rob Sanderson: do you understand the issue well enough for us to send a ticket to TAG? Hadley Beeman: what’s the header for this? Benjamin Young: maybe “integrity” Hadley Beeman: please write this up with use cases … because otherwise we’re staying hypothetical Benjamin Young: there is no requirement to do remote context dereferencing … but people will Dan Brickley: there are the specs, then there are ways you relay operate … with JSON-LD, if you do it wrong, people flame you on the mailing list … social pressure has forced us to become a component in a larger system—not cool! Dan Brickley: we’d like people to get what they are expecting when they get contexts Benjamin Young: many communities have this same problem … with many different solutions … that’s what makes it a TAG deal … another big JSON-LD deployment is Mastodon (uses Activity Streams) Adam Soroka: .. they haven’t updated their context for a long time Benjamin Young: but it still works … which is what makes this a Publishing WG problem—publishing that is broken because a system went down somewhere isn’t publishing as publishers understand it |
FWIW, I wrote a demo document loader for I'd argue that things like immutable resource integrity are problems with HTTP, not with JSON-LD's syntax. If it's important for your application, you should be using a different URI scheme that fits your needs. |
I'd like to surface @msporny's "hashlink" email in relation to this discussion: Our current approach to document loader configuration makes implementing that a possibility. However, the approach taken in the hashlink proposal may also address some of our other concerns such as #86 and #108 which sit along side this topic. Worth some time, regardless. 😃 |
This issue was discussed in a meeting.
View the transcriptIvan Herman: it is interesting, no doubt… (puts admin hat on) … these are very early drafts … Manu made this proposal in December, very early days … we’re fine, though. … if the technique becomes a standard form for URIs, we can use it as such … I’m happy to have us say that we are interested … we should try to see whether this tech really can be used to annotate links while we also pursue other avenues Gregg Kellogg: really like the idea, could be very useful Gregg Kellogg: but not clear that it really affects our docs at all … what would we change? Ivan Herman: we shouldn’t close issues just because this new thing is a new thing Gregg Kellogg: unless we think that we can do that via a reference in the best practice docs Adam Soroka: if hashlink solves for problems we have, then we may not want to recreate alternative features that solve the same issues Benjamin Young: although this new hash URI tech isn’t standardized, it has been put into use |
I believe we can close this issue with no changes necessary. Perhaps a discussion in the Best Practices doc. |
Resolved to close on 2019-07-12 call - https://www.w3.org/2018/json-ld-wg/Meetings/Minutes/2019/2019-07-12-json-ld |
@azaroth42 I'd like to read the meeting minutes, but the link above is a 404 for me. I also can't find them in https://github.com/w3c/json-ld-wg/tree/master/_minutes. |
@vmx I'll be generating last Friday's minutes today. Our normal minute generating super hero is out of the office, atm, so I get to feed the bots while they're a way. 😁 I'll update this issue once they're generated. Raw logs are available (if you're in a hurry): https://www.w3.org/2019/07/12-json-ld-irc |
@vmx here are the logs as promised! https://www.w3.org/2018/json-ld-wg/Meetings/Minutes/2019/2019-07-12-json-ld If you have specific questions or ideas about this issue at this point, it's probably best to file a new issue or send a mailing list post to our public mailing list. Cheers! |
Provide a means for refering to a remote context without without requiring it to be downloaded
duri
ortdb
schemesOriginal issue Content addressable contexts #547
The text was updated successfully, but these errors were encountered: