Revisions to Is it possible to build a system to generate UUIDs where every UUID is guaranteed unique?

added 1614 characters in body

Source Link

edited Apr 27, 2024 at 10:38

49.4k
4
71
137

If your goal is having no collisions then there is a simple strategy that does is at the expense of having many collisions if there are any:

You have a large number of processes that generate UUIDs. For example my computer might be one such process. Each process generates one random UUID, and from then on returns the next UUID every time. If two processes each generate a million UUIDs then you get a collision only if the initial UUIDs are less than a million apart. However, if you have one collision, you will have many. The probability to have any collision at all is much smaller. The expected number of collisions is exactly the same as with random UUIDs. What is very important is that the first random number is truly 128 bits or more.

But "guaranteed unique UUIDs" is only possible if you have one database keeping track of all UUIDs ever created, and not allowing any duplicate entries. What would be good enough is a central database that hands out ranges of say one million UUIDs to processes wanting to create UUIDs, and those processes are responsible for not creating duplicate UUIDs in their range, and no process is ever allowed to create a UUID outside that range. Of course there is no way to enforce this.

But then, this is all pointless. If people go on creating proper random UUIDs forever, without anyone checking for duplicates, whether they are unique or not only matters if you can find two identical UUIDs. And that is impossible if they are random, because you would need to compare about $2^{62}$2^62 UUIDs to another $2^{62}$2^62 UUIDs to find a duplicate and that is impossible.

Of course if someone acts in an utterly stupid way, and for example creates UUIDs based on the current time in seconds (less than 32 million per year), then you will have lots of duplicates. But if you have two UUIDs, one created by a stupid process, and one created in a proper random way, you will never manage to find identical UUIDs from two sources.

Say the developers at Apple went completely mad and their UUID creator on iOS started creating UUIDs based on the second of the current time only, much less than a billion different UUIDs for the next 30 years, there would be plenty of collisions between those, but you would now have to check about $2^{94}$2^94 proper randomly created UUIDs to find a collision between the two sets.

And most of the times collisions between UUIDs don't actually matter. Say my music player uses UUIDs to identify songs in my library, and in everyone else's libraries all over the world. And my photo app uses UUIDs to identify photos. Neither my music player nor yours nor any of our photo apps will ever compare the UUID of a song with the UUID of a photo. So if there is a collision between those groups, nobody will ever know and nobody will ever be affected by it.

If your goal is having no collisions then there is a simple strategy that does is at the expense of having many collisions if there are any:

You have a large number of processes that generate UUIDs. For example my computer might be one such process. Each process generates one random UUID, and from then on returns the next UUID every time. If two processes each generate a million UUIDs then you get a collision only if the initial UUIDs are less than a million apart. However, if you have one collision, you will have many. The probability to have any collision at all is much smaller. The expected number of collisions is exactly the same as with random UUIDs. What is very important is that the first random number is truly 128 bits or more.

But "guaranteed unique UUIDs" is only possible if you have one database keeping track of all UUIDs ever created, and not allowing any duplicate entries. What would be good enough is a central database that hands out ranges of say one million UUIDs to processes wanting to create UUIDs, and those processes are responsible for not creating duplicate UUIDs in their range, and no process is ever allowed to create a UUID outside that range. Of course there is no way to enforce this.

But then, this is all pointless. If people go on creating proper random UUIDs forever, without anyone checking for duplicates, whether they are unique or not only matters if you can find two identical UUIDs. And that is impossible if they are random, because you would need to compare about $2^{62}$ UUIDs to another $2^{62}$ UUIDs to find a duplicate and that is impossible.

Of course if someone acts in an utterly stupid way, and for example creates UUIDs based on the current time in seconds (less than 32 million per year), then you will have lots of duplicates. But if you have two UUIDs, one created by a stupid process, and one created in a proper random way, you will never manage to find identical UUIDs from two sources.

Say the developers at Apple went completely mad and their UUID creator on iOS started creating UUIDs based on the second of the current time only, much less than a billion different UUIDs for the next 30 years, there would be plenty of collisions between those, but you would now have to check about $2^{94}$ proper randomly created UUIDs to find a collision between the two sets.

If your goal is having no collisions then there is a simple strategy that does is at the expense of having many collisions if there are any:

You have a large number of processes that generate UUIDs. For example my computer might be one such process. Each process generates one random UUID, and from then on returns the next UUID every time. If two processes each generate a million UUIDs then you get a collision only if the initial UUIDs are less than a million apart. However, if you have one collision, you will have many. The probability to have any collision at all is much smaller. The expected number of collisions is exactly the same as with random UUIDs. What is very important is that the first random number is truly 128 bits or more.

But "guaranteed unique UUIDs" is only possible if you have one database keeping track of all UUIDs ever created, and not allowing any duplicate entries. What would be good enough is a central database that hands out ranges of say one million UUIDs to processes wanting to create UUIDs, and those processes are responsible for not creating duplicate UUIDs in their range, and no process is ever allowed to create a UUID outside that range. Of course there is no way to enforce this.

But then, this is all pointless. If people go on creating proper random UUIDs forever, without anyone checking for duplicates, whether they are unique or not only matters if you can find two identical UUIDs. And that is impossible if they are random, because you would need to compare about 2^62 UUIDs to another 2^62 UUIDs to find a duplicate and that is impossible.

Of course if someone acts in an utterly stupid way, and for example creates UUIDs based on the current time in seconds (less than 32 million per year), then you will have lots of duplicates. But if you have two UUIDs, one created by a stupid process, and one created in a proper random way, you will never manage to find identical UUIDs from two sources.

Say the developers at Apple went completely mad and their UUID creator on iOS started creating UUIDs based on the second of the current time only, much less than a billion different UUIDs for the next 30 years, there would be plenty of collisions between those, but you would now have to check about 2^94 proper randomly created UUIDs to find a collision between the two sets.

And most of the times collisions between UUIDs don't actually matter. Say my music player uses UUIDs to identify songs in my library, and in everyone else's libraries all over the world. And my photo app uses UUIDs to identify photos. Neither my music player nor yours nor any of our photo apps will ever compare the UUID of a song with the UUID of a photo. So if there is a collision between those groups, nobody will ever know and nobody will ever be affected by it.

added 1614 characters in body

Source Link

edited Apr 27, 2024 at 10:33

gnasher729

49.4k
4
71
137