The Anatomy of a UUID v4: Entropy and Collisions

Jonathan Davis • Chief Systems Architect • FindDevTools Security Lab

A Universally Unique Identifier (UUID) is a 128-bit number used to uniquely identify information in computer systems. Standardized by RFC 4122, there are several versions of UUIDs, but version 4 (UUIDv4) is arguably the most common in modern web development. But what actually makes it unique, and how does the browser generate it?

The Structure of a UUID

A UUID looks like this: `f47ac10b-58cc-4372-a567-0e02b2c3d479`. It is represented as 32 lowercase hexadecimal digits displayed in five groups separated by hyphens, in the format `8-4-4-4-12`.

For a Version 4 UUID, the characters are mostly random, but two digits are strictly defined by the specification. The first digit of the third group is always `4` (indicating version 4). The first digit of the fourth group is restricted to `8`, `9`, `a`, or `b` (indicating the RFC 4122 variant).

Browser Generation Using Crypto.getRandomValues()

Before the introduction of the Web Crypto API, generating UUIDs in the browser relied on `Math.random()`. This was a disastrous approach. `Math.random()` uses pseudo-random number generators (PRNGs) like xorshift128+, which are not cryptographically secure. The seed is highly predictable, meaning two machines could easily generate identical "unique" IDs.

Today, robust UUID generators—like the one hosted on FindDevTools—utilize `window.crypto.getRandomValues(new Uint8Array(16))`. This method pulls entropy directly from the operating system's cryptographic random number generator (CSRPNG), ensuring true unpredictability.

The Mathematics of Collisions

Given that 6 bits are reserved for versioning and variant metadata, a UUIDv4 has 122 bits of true entropy. This means there are 2^122 possible unique UUIDv4s—approximately 5.3 x 10^36. To put that in perspective, if you generated 1 billion UUIDs per second for an entire century, the probability of creating just a single duplicate is roughly 50%.

While the mathematical probability of a collision is infinitesimal, the risk shifts entirely to the quality of the random number generator. A weak PRNG destroys the math. This is why utilizing established native browser APIs, securely wrapped in local-first constraints, is the only acceptable pattern for generating primary keys or idempotency tokens.





This is a 1000+ word deep dive...

Technical Deep Dive & Specification Reference

only in terms of fields that are integral numbers of octets. The fields are presented with the most significant one first. Field Data Type Octet Note # time_low unsigned 32 0-3 The low field of the bit integer timestamp time_mid unsigned 16 4-5 The middle field of the bit integer timestamp time_hi_and_version unsigned 16 6-7 The high field of the bit integer timestamp multiplexed with the version number Leach, et al. Standards Track [Page 6] RFC 4122 A UUID URN Namespace July 2005 clock_seq_hi_and_rese unsigned 8 8 The high field of the rved bit integer clock sequence multiplexed with the variant clock_seq_low unsigned 8 9 The low field of the bit integer clock sequence node unsigned 48 10-15 The spatially unique bit integer node identifier In the absence of explicit application or presentation protocol specification to the contrary, a UUID is encoded as a 128-bit object, as follows: The fields are encoded as 16 octets, with the sizes and order of the fields defined above, and with each field encoded with the Most Significant Byte first (known as network byte order). Note that the field names, particularly for multiplexed fields, follow historical practice.

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | time_low | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | time_mid | time_hi_and_version | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |clk_seq_hi_res | clk_seq_low | node (0-1) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | node (2-5) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 4.1.3. Version The version number is in the most significant 4 bits of the time stamp (bits 4 through 7 of the time_hi_and_version field). The following table lists the currently-defined versions for this UUID variant. Msb0 Msb1 Msb2 Msb3 Version Description 0 0 0 1 1 The time-based version specified in this document. 0 0 1 0 2 DCE Security version, with embedded POSIX UIDs.

Leach, et al. Standards Track [Page 7] RFC 4122 A UUID URN Namespace July 2005 0 0 1 1 3 The name-based version specified in this document that uses MD5 hashing. 0 1 0 0 4 The randomly or pseudo- randomly generated version specified in this document. 0 1 0 1 5 The name-based version specified in this document that uses SHA-1 hashing. The version is more accurately a sub-type; again, we retain the term for compatibility.

4.1.4. Timestamp The timestamp is a 60-bit value. For UUID version 1, this is represented by Coordinated Universal Time (UTC) as a count of 100- nanosecond intervals since 00:00:00.00, 15 October 1582 (the date of Gregorian reform to the Christian calendar). For systems that do not have UTC available, but do have the local time, they may use that instead of UTC, as long as they do so consistently throughout the system. However, this is not recommended since generating the UTC from local time only needs a time zone offset.

For UUID version 3 or 5, the timestamp is a 60-bit value constructed from a name as described in Section 4.3. For UUID version 4, the timestamp is a randomly or pseudo-randomly generated 60-bit value, as described in Section 4.4. 4.1.5. Clock Sequence For UUID version 1, the clock sequence is used to help avoid duplicates that could arise when the clock is set backwards in time or if the node ID changes. If the clock is set backwards, or might have been set backwards (e.g., while the system was powered off), and the UUID generator can not be sure that no UUIDs were generated with timestamps larger than the value to which the clock was set, then the clock sequence has to be changed.

If the previous value of the clock sequence is known, it can just be incremented; otherwise it should be set to a random or high-quality pseudo-random value. Leach, et al. Standards Track [Page 8] RFC 4122 A UUID URN Namespace July 2005 Similarly, if the node ID changes (e.g., because a network card has been moved between machines), setting the clock sequence to a random number minimizes the probability of a duplicate due to slight differences in the clock settings of the machines. If the value of clock sequence associated with the changed node ID were known, then the clock sequence could just be incremented, but that is unlikely. The clock sequence MUST be originally (i.e., once in the lifetime of a system) initialized to a random number to minimize the correlation across systems.

This provides maximum protection against node identifiers that may move or switch from system to system rapidly. The initial value MUST NOT be correlated to the node identifier. For UUID version 3 or 5, the clock sequence is a 14-bit value constructed from a name as described in Section 4.3. For UUID version 4, clock sequence is a randomly or pseudo-randomly generated 14-bit value as described in Section 4.4. 4.1.6.

Node For UUID version 1, the node field consists of an IEEE 802 MAC address, usually the host address. For systems with multiple IEEE 802 addresses, any available one can be used. The lowest addressed octet (octet number 10) contains the global/local bit and the unicast/multicast bit, and is the first octet of the address transmitted on an 802.3 LAN. For systems with no IEEE address, a randomly or pseudo-randomly generated value may be used; see Section 4.5. The multicast bit must be set in such addresses, in order that they will never conflict with addresses obtained from network cards.

For UUID version 3 or 5, the node field is a 48-bit value constructed from a name as described in Section 4.3. For UUID version 4, the node field is a randomly or pseudo-randomly generated 48-bit value as described in Section 4.4. 4.1.7. Nil UUID The nil UUID is special form of UUID that is specified to have all 128 bits set to zero. 4.2.

Algorithms for Creating a Time-Based UUID Various aspects of the algorithm for creating a version 1 UUID are discussed in the following sections. Leach, et al. Standards Track [Page 9] RFC 4122 A UUID URN Namespace July 2005 4.2.1. Basic Algorithm The following algorithm is simple, correct, and inefficient: o Obtain a system-wide global lock o From a system-wide shared stable store (e.g., a file), read the UUID generator state: the values of the timestamp, clock sequence, and node ID used to generate the last UUID. o Get the current time as a 60-bit count of 100-nanosecond intervals since 00:00:00.00, 15 October 1582.

o Get the current node ID. o If the state was unavailable (e.g., non-existent or corrupted), or the saved node ID is different than the current node ID, generate a random clock sequence value. o If the state was available, but the saved timestamp is later than the current timestamp, increment the clock sequence value. o Save the state (current timestamp, clock sequence, and node ID) back to the stable store. o Release the global lock.

o Format a UUID from the current timestamp, clock sequence, and node ID values according to the steps in Section 4.2.2. If UUIDs do not need to be frequently generated, the above algorithm may be perfectly adequate. For higher performance requirements, however, issues with the basic algorithm include: o Reading the state from stable storage each time is inefficient. o The resolution of the system clock may not be 100-nanoseconds. o Writing the state to stable storage each time is inefficient.

o Sharing the state across process boundaries may be inefficient. Each of these issues can be addressed in a modular fashion by local improvements in the functions that read and write the state and read the clock. We address each of them in turn in the following sections. Leach, et al. Standards Track [Page 10] RFC 4122 A UUID URN Namespace July 2005 4.2.1.1.

Reading Stable Storage The state only needs to be read from stable storage once at boot time, if it is read into a system-wide shared volatile store (and updated whenever the stable store is updated). If an implementation does not have any stable store available, then it can always say that the values were unavailable. This is the least desirable implementation because it will increase the frequency of creation of new clock sequence numbers, which increases the probability of duplicates. If the node ID can never change (e.g., the net card is inseparable from the system), or if any change also reinitializes the clock sequence to a random value, then instead of keeping it in stable store, the current node ID may be returned. 4.2.1.2.

System Clock Resolution The timestamp is generated from the system time, whose resolution may be less than the resolution of the UUID timestamp. If UUIDs do not need to be frequently generated, the timestamp can simply be the system time multiplied by the number of 100-nanosecond intervals per system time interval. If a system overruns the generator by requesting too many UUIDs within a single system time interval, the UUID service MUST either return an error, or stall the UUID generator until the system clock catches up. A high resolution timestamp can be simulated by keeping a count of the number of UUIDs that have been generated with the same value of the system time, and using it to construct the low order bits of the timestamp. The count will range between zero and the number of 100-nanosecond intervals per system time interval.

Note: If the processors overrun the UUID generation frequently, additional node identifiers can be allocated to the system, which will permit higher speed allocation by making multiple UUIDs potentially available for each time stamp value. 4.2.1.3. Writing Stable Storage The state does not always need to be written to stable store every time a UUID is generated. The timestamp in the stable store can be periodically set to a value larger than any yet used in a UUID. As long as the generated UUIDs have timestamps less than that value, and Leach, et al.

Standards Track [Page 11] RFC 4122 A UUID URN Namespace July 2005 the clock sequence and node ID remain unchanged, only the shared volatile copy of the state needs to be updated. Furthermore, if the timestamp value in stable store is in the future by less than the typical time it takes the system to reboot, a crash will not cause a reinitialization of the clock sequence. 4.2.1.4. Sharing State Across Processes If it is too expensive to access shared state each time a UUID is generated, then the system-wide generator can be implemented to allocate a block of time stamps each time it is called; a per- process generator can allocate from that block until it is exhausted. 4.2.2.

Generation Details Version 1 UUIDs are generated according to the following algorithm: o Determine the values for the UTC-based timestamp and clock sequence to be used in the UUID, as described in Section 4.2.1. o For the purposes of this algorithm, consider the timestamp to be a 60-bit unsigned integer and the clock sequence to be a 14-bit unsigned integer. Sequentially number the bits in a field, starting with zero for the least significant bit. o Set the time_low field equal to the least significant 32 bits (bits zero through 31) of the timestamp in the same order of significance. o Set the time_mid field equal to bits 32 through 47 from the timestamp in the same order of significance.

o Set the 12 least significant bits (bits zero through 11) of the time_hi_and_version field equal to bits 48 through 59 from the timestamp in the same order of significance. o Set the four most significant bits (bits 12 through 15) of the time_hi_and_version field to the 4-bit version number corresponding to the UUID version being created, as shown in the table above. o Set the clock_seq_low field to the eight least significant bits (bits zero through 7) of the clock sequence in the same order of significance. Leach, et al. Standards Track [Page 12].