Review API-Guidelines: UUID as identifier #45

cgendreau · 2020-03-30T12:00:14Z

All database identifiers should be numerical based since it's efficient and easier to manage at the database level. Even if there is no issue about leaking business information we will still expose UUID instead of the database key in the API. This will give us more flexibility at the database level while reducing potential issues with API users iterating over ids or using a wrong set of ids. Based on the “Inter-module foreign key” module, UUID’s will help detect wrong linkages by making it almost impossible to reuse a key of the wrong resource. Numerical id’s can be reused in different resources (even if this can be solved by using a global sequence) while UUID are more likely to be unique across the system.

This will/should not prevent resources to get a DOI assigned.

cgendreau · 2020-04-02T13:14:15Z

Actionable, long-term stable and semantic web compatible identifiers for access to biological collection objects:
https://dx.doi.org/10.1093/database/bax003

cgendreau · 2020-04-03T14:42:53Z

The suggestion from the "Actionable, long-term stable and semantic web compatible identifiers for access to biological collection objects" paper to use HTTP URIs looks effective and quite simply to explain and implement.

The web domains owned and controlled by the respective institutions ensure the global uniqueness of the identifiers and thus, the local object identifiers only need to be unique within each institution's scope which all Institutions are capable to achieve.

Anton Güntsch, Roger Hyam, Gregor Hagedorn, Simon Chagnoux, Dominik Röpert, Ana Casino, Gabi Droege, Falko Glöckler, Karsten Gödderz, Quentin Groom, Jana Hoffmann, Ayco Holleman, Matúš Kempa, Hanna Koivula, Karol Marhold, Nicky Nicolson, Vincent S. Smith, Dagmar Triebel, Actionable, long-term stable and semantic web compatible identifiers for access to biological collection objects, Database, Volume 2017, 2017, bax003, https://doi.org/10.1093/database/bax003

cgendreau · 2020-04-03T14:47:44Z

Suggestion for the API guidelines:

Use HTTPS URIs as described in the paper (even if a record can't be publicly resolvable (e.g. sensitive information) it should get one)
Specify uniqueness of the local object identifier with a suggestion for uuid with its advantages

jdobber · 2020-04-07T07:37:02Z

Just for clarification, when using UUIDs then we should use variant 1 (date-time and MAC address).
In Python you can get a new UUID like this import uuid; uuid.uuid1().

See also: https://en.wikipedia.org/wiki/Universally_unique_identifier

cgendreau · 2020-04-07T12:22:33Z

Yes, that should be clarified. Actually the current suggestion was Variant 5.
https://github.com/DINA-Web/collection-specs/issues/13

dshorthouse · 2020-04-14T14:00:15Z

Some background on decision to use UUIDv5 from @dimus http://globalnames.org/news/2015/05/31/gn-uuid-0-5-0/. Agree with decision to use v5 because they are independently repeatable so long as all parties agree on namespace. Suggest dina-project.net as that namespace.

cgendreau · 2020-04-14T19:06:06Z

I see the advantages of v5 when assigning uuid for scientific names but it`s unclear to me the advantage of using v5 vs v4 for assigning uuid when a local identifier is used. Maybe I missed something but it could be wrong if 2 implementations would decide to use a primary key as local identifier. Since we would share the same namespace we would end up with the same identifier for 2 completely different entries. Unless the idea would be to use different namespace per institution?

dshorthouse · 2020-04-14T19:20:45Z

@cgendreau Agree. Will have to put some thought into the reasons why you want to assign a UUID to something. If the primary reasons are to uniquely identify a predominantly string-based entity AND to resolve that identity among and between organizations then a publicly-exposed UUID v5 makes some sense. If these are not the goals, then we'd have to question why we'd want to bother with the overhead when a local, incrementing primary key is far better.

cgendreau added the workshop-2020 label Mar 30, 2020

cgendreau added the API-Guidelines-Review label Apr 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review API-Guidelines: UUID as identifier #45

Review API-Guidelines: UUID as identifier #45

cgendreau commented Mar 30, 2020 •

edited

Loading

cgendreau commented Apr 2, 2020

cgendreau commented Apr 3, 2020

cgendreau commented Apr 3, 2020

jdobber commented Apr 7, 2020

cgendreau commented Apr 7, 2020

dshorthouse commented Apr 14, 2020 •

edited

Loading

cgendreau commented Apr 14, 2020 •

edited

Loading

dshorthouse commented Apr 14, 2020

Review API-Guidelines: UUID as identifier #45

Review API-Guidelines: UUID as identifier #45

Comments

cgendreau commented Mar 30, 2020 • edited Loading

cgendreau commented Apr 2, 2020

cgendreau commented Apr 3, 2020

cgendreau commented Apr 3, 2020

jdobber commented Apr 7, 2020

cgendreau commented Apr 7, 2020

dshorthouse commented Apr 14, 2020 • edited Loading

cgendreau commented Apr 14, 2020 • edited Loading

dshorthouse commented Apr 14, 2020

cgendreau commented Mar 30, 2020 •

edited

Loading

dshorthouse commented Apr 14, 2020 •

edited

Loading

cgendreau commented Apr 14, 2020 •

edited

Loading