Skip to content

Commit

Permalink
Deploy ga4gh/data-repository-service-schemas to github.com/ga4gh/data…
Browse files Browse the repository at this point in the history
…-repository-service-schemas.git:gh-pages
  • Loading branch information
traviscibot committed Sep 11, 2024
1 parent 2883f45 commit ef4b612
Show file tree
Hide file tree
Showing 7 changed files with 494 additions and 24 deletions.
3 changes: 2 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,8 @@ branch in each repository forms the master branch set.
Some general rules to follow:
- Create an issue in Github to track your work and start a conversation. Make a note of the number, you'll
need it when naming your feature branch below.
- We follow [HubFlow](https://datasift.github.io/gitflow/) which means we use
- We follow [HubFlow](https://datasift.github.io/gitflow/)
(the HubFlow repo is deprecated, see this forked repo for the preserved [instructions](https://github.com/wpsharks/hubflow)) which means we use
a feature branch strategy with pull requests always going to `develop`
and releases happening from `master`. **Please read the HubFlow guide linked above, it's a quick read and will give you a really good idea of how our branches work. Do not make pull requests to `master`!**
- If you are a core developer with write access to the repo, make a feature
Expand Down
21 changes: 17 additions & 4 deletions docs/more-background-on-compact-identifiers.html

Large diffs are not rendered by default.

46 changes: 46 additions & 0 deletions more-background-on-compact-identifiers.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
{
"openapi": "3.0.3",
"info": {
"title": "More Background on Compact Identifiers",
"version": "1.4.0",
"x-logo": {
"url": "https://www.ga4gh.org/wp-content/themes/ga4gh/dist/assets/svg/logos/logo-full-color.svg"
},
"termsOfService": "https://www.ga4gh.org/terms-and-conditions/",
"contact": {
"name": "GA4GH Cloud Work Stream",
"email": "[email protected]"
},
"license": {
"name": "Apache 2.0",
"url": "https://raw.githubusercontent.com/ga4gh/data-repository-service-schemas/master/LICENSE"
}
},
"tags": [
{
"name": "About",
"description": "This document contains more examples of resolving compact identifier-based DRS URIs than we could fit in the DRS specification or appendix. It’s provided here for your reference as a supplement to the specification.\n"
},
{
"name": "Background on Compact Identifier-Based URIs",
"description": "Compact identifiers refer to locally-unique persistent identifiers that have been namespaced to provide global uniqueness. See [\"Uniform resolution of compact identifiers for biomedical data\"](https://www.biorxiv.org/content/10.1101/101279v3) for an excellent introduction to this topic. By using compact identifiers in DRS URIs, along with a resolver registry (identifiers.org/n2t.net), systems can identify the current resolver when they need to translate a DRS URI into a fetchable URL. This allows a project to issue compact identifiers in DRS URIs and not be concerned if the project name or DRS hostname changes in the future, the current resolver can always be found through the identifiers.org/n2t.net registries. Together the identifiers.org/n2t.net systems support the resolver lookup for over 700 compact identifiers formats used in the research community, making it possible for a DRS server to use any of these as DRS IDs (or to register a new compact identifier type and resolver service of their own).\n\nWe use a DRS URI scheme rather than [Compact URIs (CURIEs)](https://en.wikipedia.org/wiki/CURIE) directly since we feel that systems consuming DRS objects will be able to better differentiate a DRS URI. CURIEs are widely used in the research community, and we feel the fact that they can point to a wide variety of entities (HTML documents, PDFs, identities in data models, etc) makes it more difficult for systems to unambiguously identify entities as DRS objects.\n\nStill, to make compact identifiers work in DRS URIs we leverage the CURIE format used by identifiers.org/n2t.net. Compact identifiers have the form:\n\n```\nprefix:accession\n```\n\nThe prefix can be divided into a `provider_code` (optional) and `namespace`. The `accession` here is an Ark, DOI, Data GUID, or another issuer's local ID for the object being pointed to:\n\n```\n[provider_code/]namespace:accession\n```\n\nBoth the `provider_code` and `namespace` disallow spaces or punctuation, only lowercase alphanumerical characters, underscores and dots are allowed.\n\n[Examples](https://n2t.net/e/compact_ids.html) include (from n2t.net):\n\n```\nPDB:2gc4\nTaxon:9606\nDOI:10.5281/ZENODO.1289856\nark:/47881/m6g15z54\nIGSN:SSH000SUA\n```\n\nTip:\n> DRS URIs using compact identifiers with resolvers registered in identifiers.org/n2t.net can be distinguished from the hostname-based DRS URIs below based on the required \":\" which is not allowed in hostname-based URI.\n\nSee the documentation on [n2t.net](https://n2t.net/e/compact_ids.html) and [identifiers.org](https://docs.identifiers.org/) for much more information on the compact identifiers used there and details about the resolution process.\n"
},
{
"name": "Registering a DRS Server on a Meta-Resolver",
"description": "See the documentation on the [n2t.net](https://n2t.net/e/compact_ids.html) and [identifiers.org](https://docs.identifiers.org/) meta-resolvers for adding your own compact identifier type and registering your DRS server as a resolver. You can register new prefixes (or mirrors by adding resource provider codes) for free using a simple online form.\n\nKeep in mind, while anyone can register prefixes, the identifiers.org/n2t.net sites do basic hand curation to verify new prefix and resource (provider code) requests. See those sites for more details on their security practices. For more information see\n\nStarting with the prefix for our new compact identifier, let’s register the namespace `mydrsprefix` on identifiers.org/n2t.net and use 5-digit numeric IDs as our accessions. We will then link this to the DRS server at https://mydrs.server.org/ga4gh/drs/v1/ by filling in the provider details. Here’s what that the registration for our new namespace looks like on [identifiers.org](https://registry.identifiers.org/prefixregistrationrequest):\n\n![Prefix Register 1](/data-repository-service-schemas/public/img/prefix_register_1.png)\n\n![Prefix Register 2](/data-repository-service-schemas/public/img/prefix_register_2.png)\n"
},
{
"name": "Example DRS Client Compact Identifier-Based URI Resolution Process - Existing Compact Identifier Provider",
"description": "A DRS client identifies the DRS URI compact identifier components using the first occurrence of \"/\" (optional) and \":\" characters. These are not allowed inside the provider_code (optional) or the namespace. The \":\" character is not allowed in a Hostname-based DRS URI, providing a convenient mechanism to differentiate them. Once the provider_code (optional) and namespace are extracted from a DRS compact identifier-based URI, a client can use services on identifiers.org to identify available resolvers.\n\n*Let’s look at a specific example DRS compact identifier-based URI that uses DOIs, a popular compact identifier, and walk through the process that a client would use to resolve it. Keep in mind, the resolution process is the same from the client perspective if a given DRS server is using an existing compact identifier type (DOIs, ARKs, Data GUIDs) or creating their own compact identifier type for their DRS server and registering it on identifiers.org/n2t.net.*\n\nStarting with the DRS URI:\n\n```\ndrs://doi:10.5072/FK2805660V\n```\n\nwith a namespace of \"doi\", the following GET request will return information about the namespace:\n\n```\nGET https://registry.api.identifiers.org/restApi/namespaces/search/findByPrefix?prefix=doi\n```\n\nThis information then points to resolvers for the \"doi\" namespace. This \"doi\" namespace was assigned a namespace ID of 75 by identifiers.org. This \"id\" has nothing to do with compact identifier accessions (which are used in the URL pattern as `{$id}` below) or DRS IDs. This namespace ID (75 below) is purely an identifiers.org internal ID for use with their APIs:\n\n```\nGET https://registry.api.identifiers.org/restApi/resources/search/findAllByNamespaceId?id=75\n```\n\nThis returns enough information to, ultimately, identify one or more resolvers and each have a URL pattern that, for DRS-supporting systems, provides a URL template for making a successful DRS GET request. For example, the DOI urlPattern is:\n\n```\nurlPattern: \"https://doi.org/{$id}\"\n```\n\nAnd the `{$id}` here refers to the accession from the compact identifier (in this example the accession is `10.5072/FK2805660V`). If applicable, a provider code can be supplied in the above requests to specify a particular mirror if there are multiple resolvers for this namespace. In the case of DOIs, you only get a single resolver.\n\nGiven this information you now know you can make a GET on the URL:\n\n```\nGET https://doi.org/10.5072/FK2805660V\n```\n\n*The URL above is valid for a DOI object but it is not actually a DRS server! Instead, it redirects to a DRS server through a series of HTTPS redirects. This is likely to be common when working with existing compact identifiers like DOIs or ARKs. Regardless, the redirect should eventually lead to a DRS URL that percent-encodes the accession as a DRS ID in a DRS object API call. For a **hypothetical** example, here’s what a redirect to a DRS API URL might ultimately look like. A client doesn't have to do anything other than follow the HTTPS redirects. The link between the DOI resolver on doi.org and the DRS server URL below is the result of the DRS server registering their data objects with a DOI issuer.*\n\n```\nGET https://drs.example.org/ga4gh/drs/v1/objects/10.5072%2FFK2805660V\n```\n\nIDs in DRS hostname-based URIs/URLs are always percent-encoded to eliminate ambiguity even though the DRS compact identifier-based URIs and the identifier.org's API do not percent-encode accessions. This was done in order to 1) follow the CURIE conventions of identifiers.org/n2t.net for compact identifier-based DRS URIs and 2) to aid in readability for users who understand they are working with compact identifiers. **The general rule of thumb, when using a compact identifier accession as a DRS ID in a DRS API call, make sure to percent-encode it. An easy way for a DRS client to handle this is to get the initial DRS object JSON response from whatever redirects the compact identifier resolves to, then look for the** `self_uri` **in the JSON, which will give you the correctly percent-encoded DRS ID for subsequent DRS API calls such as the** `access` **method.**\n"
},
{
"name": "Example DRS Client Compact Identifier-Based URI Resolution Process - Registering a new Compact Identifier for Your DRS Server",
"description": "See the documentation on [n2t.net](https://n2t.net/e/compact_ids.html) and [identifiers.org](https://docs.identifiers.org/) for adding your own compact identifier type and registering your DRS server as a resolver. We document this in more detail in the [main specification document](./index.html).\n\nNow the question is how does a client resolve your newly registered compact identifier for your DRS server? *It turns out, whether specific to a DRS implementation or using existing compact identifiers like ARKs or DOIs, the DRS client resolution process for compact identifier-based URIs is exactly the same.* We briefly run through process below for a new compact identifier as an example but, again, a client will not need to do anything different from the resolution process documented in \"DRS Client Compact Identifier-Based URI Resolution Process - Existing Compact Identifier Provider\".\n\nNow we can issue DRS URI for our data objects like:\n\n```\ndrs://mydrsprefix:12345\n```\n\nThis is a little simpler than working with DOIs or other existing compact identifier issuers out there since we can create our own IDs and not have to allocate them through a third-party service (see \"Issuing Existing Compact Identifiers for Use with Your DRS Server\" below).\n\nWith a namespace of \"mydrsprefix\", the following GET request will return information about the namespace:\n\n```\nGET https://registry.api.identifiers.org/restApi/namespaces/search/findByPrefix?prefix=mydrsprefix\n```\n\n*Of course, this is a hypothetical example so the actual API call won’t work, but you can see the GET request is identical to \"DRS Client Compact Identifier-Based URI Resolution Process - Existing Compact Identifier Provider\".*\n\nThis information then points to resolvers for the \"mydrsprefix\" namespace. Hypothetically, this \"mydrsprefix\" namespace was assigned a namespace ID of 1829 by identifiers.org. This \"id\" has nothing to do with compact identifier accessions (which are used in the URL pattern as `{$id}` below) or DRS IDs. This namespace ID (1829 below) is purely an identifiers.org internal ID for use with their APIs:\n\n```\nGET https://registry.api.identifiers.org/restApi/resources/search/findAllByNamespaceId?id=1829\n```\n\n*Like the previous GET request this URL won’t work but you can see the GET request is identical to \"DRS Client Compact Identifier-Based URI Resolution Process - Existing Compact Identifier Provider\".*\n\nThis returns enough information to, ultimately, identify one or more resolvers and each have a URL pattern that, for DRS-supporting systems, provides a URL template for making a successful DRS GET request. For example, the \"mydrsprefix\" urlPattern is:\n\n```\nurlPattern: \"https://mydrs.server.org/ga4gh/drs/v1/objects/{$id}\"\n```\n\nAnd the `{$id}` here refers to the accession from the compact identifier (in this example the accession is `12345`). If applicable, a provider code can be supplied in the above requests to specify a particular mirror if there are multiple resolvers for this namespace.\n\nGiven this information you now know you can make a GET on the URL:\n\n```\nGET https://mydrs.server.org/ga4gh/drs/v1/objects/12345\n```\n\nSo, compared to using a third party service like DOIs and ARKs, this would be a direct pointer to a DRS server. However, just as with \"DRS Client Compact Identifier-Based URI Resolution Process - Existing Compact Identifier Provider\", the client should always be prepared to follow HTTPS redirects.\n\n*To summarize, a client resolving a custom compact identifier registered for a single DRS server is actually the same as resolving using a third-party compact identifier service like ARKs or DOIs with a DRS server, just make sure to follow redirects in all cases.*\n\n**Note: Issuing Existing Compact Identifiers for Use with Your DRS Server**\n\nSee the documentation on [n2t.net](https://n2t.net/e/compact_ids.html) and [identifiers.org](https://docs.identifiers.org/) for information about all the compact identifiers that are supported. You can choose to use an existing compact identifier provider for your DRS server, as we did in the example above using DOIs (\"DRS Client Compact Identifier-Based URI Resolution Process - Existing Compact Identifier Provider\"). Just keep in mind, each provider will have their own approach for generating compact identifiers and associating them with a DRS data object URL. Some compact identifier providers, like DOIs, provide a method whereby you can register in their network and get your own prefix, allowing you to mint your own accessions. Other services, like the University of California’s [EZID](https://ezid.cdlib.org/) service, provide accounts and a mechanism to mint accessions centrally for each of your data objects. For experimentation we recommend you take a look at the EZID website that allows you to create DOIs and ARKs and associate them with your data object URLs on your DRS server for testing purposes.\n"
},
{
"name": "Example How To Handle Extra Metadata for DRS Objects",
"description": "## DRS and Data Connect\n\nWith DRS objects it may be necessary to attach additional metadata to your objects. We believe that a change to the API of DRS to include metadata is not in the spirit of the DRS spec and in general DRS should have no knowledge of the metadata associated with the objects. We have found that a good GA4GH standard to support this is Data Connect (https://github.com/ga4gh-discovery/data-connect). The general approach would be to have a Data Connect service on your platform and to include \"tables\" with the ID matching your DRS ID for the same object. This means that if you have metadata associated with an object id `abcd` (ex. additional information about Compound Objects) all you need to do is request the information from the Data Connect client at `/tables/abcd/info`. There are optional functionalities of Data Connect, such as querying of tables, but we do not explore them or give any recommendations here.\n\nHere is an example of using Data Connect with DRS in the fasp-scripts repository (https://github.com/ga4gh/fasp-scripts/blob/master/notebooks/drs/DRS%20File%20Data.ipynb). In this notebook we can see that data connect is used to get DRS IDs from a platform. Those DRS IDs are then used to gather aditional information about the file that might be necessary for analysis. This is just one example of how DRS and Data Connect can interact with each other to gather information about data on a platform."
}
],
"components": {}
}
Loading

0 comments on commit ef4b612

Please sign in to comment.