-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comparison to ULID #8
Comments
We've had a couple of people approach us about ULID after publishing. ULID is very similar in aim to KSUID, so it's an apt comparison. The biggest difference I see is the length of the timestamp and random payload. KSUID does not attempt to be length-compatible with standard UUIDs, where-as ULID does. I think the timestamp bit is mostly self-explanatory (more runway, more precision), but I would be at least cautious about using only 80 bits of randomness. In terms of numbers, each bit roughly cuts the chance of a collision in half, so those extra 48 bits in KSUID matter quite a bit more in terms of collision-safety than the first 80 bits used in ULID. The 122 random bits used in UUIDv4 has been very widely deployed and battle-tested in production systems for over a decade. Given this empirical evidence we chose 122 bits as our floor, and extended to 128 bits to simplify the implementation. In essence we decided to go with the safest approach that is most widely accepted at the sacrifice of 6 extra bytes per ID. |
However, the collision space is reduced to one millisecond. I'm not a math guru but it sounds like 80 bits is practically enough. Here is some math. Based on wiki, here's a script for calculating the collision probability.
I believe 80 bit of randomness if enough if you are not generating more than a gig of IDs in one millisecond. |
Thanks Rick, that's a useful breakdown and I think would perhaps make a good addition to the README for future wanderings of people evaluating different solutions. |
@thinxer the 128-bits of random data is used to avoid dependency on the wall-clock for collision safety. If one is concerned about the extra 4 bytes of data or if millisecond precision is needed, then ULID is probably a better choice. |
In theory yes but if two computers are on the same millisecond tick, they are more likely to generate the same random sequence. Of course this is implementation dependent, but I’m assuming this being used to generate ids on untrusted clients |
One point to add is that ksuid timestamps are not just lower precision, but also smaller future range:
|
Another difference is encoding:
|
Am I correct that this library will not generate a millisecond level of precision timestamp, though it uses one to generate its randomness? Hence, I would not be able to get the timestamp down to a millisecond when extracting timestamp data? |
That is correct, only 1 second of precision is offered by the timestamp embedded in KSUIDs. |
How to make sortable KSUID Base62 |
I'm trying to summarize the differences regarding the variant bits:
UUIDv4 has (all random) 122 variant bits. ULID has (48 timestamp till-10889AD-millisecond-precision + 80 random) 128 variant bits base32-encoded. KSUID has (32 timestamp till-2150AD-seconds-precision + 128 random) 160 variant bits base62-encoded. I might be debatable if only the random bits should count towards the "collision safe" variant part. rbranson already answered KSUID's point of view on that:
I understand this makes sense for systems where the entropy for the pseudo-random numbers generator depends on the system clock, since this dependency might yield redundancy between the timestamp bits and the random bits. Then, it makes sense to also choose to be free from the dependency or not between the system clock and the system pseudo-random numbers generator. EDIT: The A brief history of the UUID article by @rbranson (a great read!) also mentions:
|
Might be useful to document the reasoning as to why this is better than ULID (or perhaps use cases where it is better and where not)
https://github.com/alizain/ulid/
The text was updated successfully, but these errors were encountered: