Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison to ULID #8

Open
nicpottier opened this issue Jun 7, 2017 · 11 comments
Open

Comparison to ULID #8

nicpottier opened this issue Jun 7, 2017 · 11 comments
Assignees

Comments

@nicpottier
Copy link

Might be useful to document the reasoning as to why this is better than ULID (or perhaps use cases where it is better and where not)

https://github.com/alizain/ulid/

@rbranson rbranson self-assigned this Jun 7, 2017
@rbranson
Copy link
Contributor

rbranson commented Jun 7, 2017

We've had a couple of people approach us about ULID after publishing. ULID is very similar in aim to KSUID, so it's an apt comparison. The biggest difference I see is the length of the timestamp and random payload. KSUID does not attempt to be length-compatible with standard UUIDs, where-as ULID does.

I think the timestamp bit is mostly self-explanatory (more runway, more precision), but I would be at least cautious about using only 80 bits of randomness. In terms of numbers, each bit roughly cuts the chance of a collision in half, so those extra 48 bits in KSUID matter quite a bit more in terms of collision-safety than the first 80 bits used in ULID.

The 122 random bits used in UUIDv4 has been very widely deployed and battle-tested in production systems for over a decade. Given this empirical evidence we chose 122 bits as our floor, and extended to 128 bits to simplify the implementation. In essence we decided to go with the safest approach that is most widely accepted at the sacrifice of 6 extra bytes per ID.

@thinxer
Copy link

thinxer commented Jun 8, 2017

However, the collision space is reduced to one millisecond. I'm not a math guru but it sounds like 80 bits is practically enough.

Here is some math.

Based on wiki, here's a script for calculating the collision probability.

>>> birthday_probability(10**9, 2**80)
4.135902207380582e-07

I believe 80 bit of randomness if enough if you are not generating more than a gig of IDs in one millisecond.

@nicpottier
Copy link
Author

Thanks Rick, that's a useful breakdown and I think would perhaps make a good addition to the README for future wanderings of people evaluating different solutions.

@rbranson
Copy link
Contributor

@thinxer the 128-bits of random data is used to avoid dependency on the wall-clock for collision safety. If one is concerned about the extra 4 bytes of data or if millisecond precision is needed, then ULID is probably a better choice.

@victornpb
Copy link

I believe 80 bit of randomness if enough if you are not generating more than a gig of IDs in one millisecond.

In theory yes but if two computers are on the same millisecond tick, they are more likely to generate the same random sequence.

Of course this is implementation dependent, but I’m assuming this being used to generate ids on untrusted clients

@cben
Copy link

cben commented Apr 24, 2019

One point to add is that ksuid timestamps are not just lower precision, but also smaller future range:

  • ksuid: "The timestamp epoch is adjusted to March 5th, 2014, providing over 100 years of useful life" (specifically, 2^32sec ~ 136 years)
  • ULID: "Won't run out of space till the year 10889 AD".

@cben
Copy link

cben commented Apr 24, 2019

Another difference is encoding:

  • ULID specifies base32 encoding, friendlier to humans "reading on the phone" (avoids similar chars like 1IiJj, case insensitive)
  • ksuid uses denser base62, resulting in almost same string length.

@mlevkov
Copy link

mlevkov commented Jul 5, 2019

@thinxer the 128-bits of random data is used to avoid dependency on the wall-clock for collision safety. If one is concerned about the extra 4 bytes of data or if millisecond precision is needed, then ULID is probably a better choice.

Am I correct that this library will not generate a millisecond level of precision timestamp, though it uses one to generate its randomness? Hence, I would not be able to get the timestamp down to a millisecond when extracting timestamp data?

@achille-roussel
Copy link
Contributor

That is correct, only 1 second of precision is offered by the timestamp embedded in KSUIDs.

@netxixi
Copy link

netxixi commented Mar 10, 2021

How to make sortable KSUID Base62

@ericbn
Copy link

ericbn commented May 31, 2023

I'm trying to summarize the differences regarding the variant bits:

The 122 random bits used in UUIDv4 has been very widely deployed and battle-tested in production systems for over a decade.

UUIDv4 has (all random) 122 variant bits. ULID has (48 timestamp till-10889AD-millisecond-precision + 80 random) 128 variant bits base32-encoded. KSUID has (32 timestamp till-2150AD-seconds-precision + 128 random) 160 variant bits base62-encoded. I might be debatable if only the random bits should count towards the "collision safe" variant part. rbranson already answered KSUID's point of view on that:

the 128-bits of random data is used to avoid dependency on the wall-clock for collision safety.

I understand this makes sense for systems where the entropy for the pseudo-random numbers generator depends on the system clock, since this dependency might yield redundancy between the timestamp bits and the random bits. Then, it makes sense to also choose to be free from the dependency or not between the system clock and the system pseudo-random numbers generator.

EDIT: The A brief history of the UUID article by @rbranson (a great read!) also mentions:

The timestamp provides 1-second resolution, which we found to be acceptable for a broad range of use cases. If a higher resolution timestamp is desired, payload bits can be traded for more timestamp bits. While high-resolution timestamp support is not included in our implementation, it is backwards compatible. Any implementation which uses 32-bit timestamps can safely work with KSUIDs that use higher resolution timestamps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants