Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review API-Guidelines: UUID as identifier #45

Open
cgendreau opened this issue Mar 30, 2020 · 8 comments
Open

Review API-Guidelines: UUID as identifier #45

cgendreau opened this issue Mar 30, 2020 · 8 comments

Comments

@cgendreau
Copy link
Contributor

cgendreau commented Mar 30, 2020

All database identifiers should be numerical based since it's efficient and easier to manage at the database level. Even if there is no issue about leaking business information we will still expose UUID instead of the database key in the API. This will give us more flexibility at the database level while reducing potential issues with API users iterating over ids or using a wrong set of ids. Based on the “Inter-module foreign key” module, UUID’s will help detect wrong linkages by making it almost impossible to reuse a key of the wrong resource. Numerical id’s can be reused in different resources (even if this can be solved by using a global sequence) while UUID are more likely to be unique across the system.

This will/should not prevent resources to get a DOI assigned.

@cgendreau
Copy link
Contributor Author

Actionable, long-term stable and semantic web compatible identifiers for access to biological collection objects:
https://dx.doi.org/10.1093/database/bax003

@cgendreau
Copy link
Contributor Author

The suggestion from the "Actionable, long-term stable and semantic web compatible identifiers for access to biological collection objects" paper to use HTTP URIs looks effective and quite simply to explain and implement.

The web domains owned and controlled by the respective institutions ensure the global uniqueness of the identifiers and thus, the local object identifiers only need to be unique within each institution's scope which all Institutions are capable to achieve.

Anton Güntsch, Roger Hyam, Gregor Hagedorn, Simon Chagnoux, Dominik Röpert, Ana Casino, Gabi Droege, Falko Glöckler, Karsten Gödderz, Quentin Groom, Jana Hoffmann, Ayco Holleman, Matúš Kempa, Hanna Koivula, Karol Marhold, Nicky Nicolson, Vincent S. Smith, Dagmar Triebel, Actionable, long-term stable and semantic web compatible identifiers for access to biological collection objects, Database, Volume 2017, 2017, bax003, https://doi.org/10.1093/database/bax003

@cgendreau
Copy link
Contributor Author

Suggestion for the API guidelines:

  • Use HTTPS URIs as described in the paper (even if a record can't be publicly resolvable (e.g. sensitive information) it should get one)
  • Specify uniqueness of the local object identifier with a suggestion for uuid with its advantages

@jdobber
Copy link

jdobber commented Apr 7, 2020

Just for clarification, when using UUIDs then we should use variant 1 (date-time and MAC address).
In Python you can get a new UUID like this import uuid; uuid.uuid1().

See also: https://en.wikipedia.org/wiki/Universally_unique_identifier

@cgendreau
Copy link
Contributor Author

Yes, that should be clarified. Actually the current suggestion was Variant 5.
https://github.com/DINA-Web/collection-specs/issues/13

@dshorthouse
Copy link

dshorthouse commented Apr 14, 2020

Some background on decision to use UUIDv5 from @dimus http://globalnames.org/news/2015/05/31/gn-uuid-0-5-0/. Agree with decision to use v5 because they are independently repeatable so long as all parties agree on namespace. Suggest dina-project.net as that namespace.

@cgendreau
Copy link
Contributor Author

cgendreau commented Apr 14, 2020

I see the advantages of v5 when assigning uuid for scientific names but it`s unclear to me the advantage of using v5 vs v4 for assigning uuid when a local identifier is used. Maybe I missed something but it could be wrong if 2 implementations would decide to use a primary key as local identifier. Since we would share the same namespace we would end up with the same identifier for 2 completely different entries. Unless the idea would be to use different namespace per institution?

@dshorthouse
Copy link

@cgendreau Agree. Will have to put some thought into the reasons why you want to assign a UUID to something. If the primary reasons are to uniquely identify a predominantly string-based entity AND to resolve that identity among and between organizations then a publicly-exposed UUID v5 makes some sense. If these are not the goals, then we'd have to question why we'd want to bother with the overhead when a local, incrementing primary key is far better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants