UUID Generator

Explanation

The UUIDGenerator class creates instances which can generate guaranteed unique IDs, assuming no two instances share the same machineAddress. A standard approach to disambiguate instances (used as the default setting) is to use the MAC address, and guarantee that all MAC addresses are unique across the set of instances of this class in use by your system.

Comparison to UUID V1

The strategy followed is largely in line with that employed by UUID V1, which is considered a time-based UUID strategy using the same (timestamp, machine name, sequence number) data.

We do not follow the same string format or 128-bit internal representation that UUID V1 employs. This was done for simplicity, but would be a useful optimization on this approach if space efficiency and standardized string-representation length was important.

ID Ordering/Sorting

The Comparable implementation of UniqueId sorts first on timestamp, then machine address, then sequence number, thus we have the guarantee that if idA < idB, then timestampA < timestampB, i.e., assuming clocks are synchronized, then idA was generated before idB.

Usage

import org.example.UUIDGenerator;

class Example {
    public static void main(String[] args) {
        var kafkaClient = new KafkaClient();
        var generator = UUIDGenerator.builder().listener(i -> kafkaClient.publishMessage("uuid-audit-log",
                i.toString())).build();
        var id1 = generator.generate();
        var id2 = generator.generate();
        assert !id1.equals(id2);
    }

    static class KafkaClient {
        void publishMessage(String topic, String message) {
            System.out.printf("Sent %s to topic %s%n", message, topic);
        }

    }
}

(Lack of) Thread safety

This class is not threadsafe. Additionally, if the fields were transactionally sound, you would want to make sure that machineAddress was unique across instances instantiated on the different threads. We do not solve this issue at the moment, but an additional field could be used to track thread ID's to uniqueify IDs across threads.

Auditing system

To allow auditing of the IDs generated by this system, we allow passing a Listener to UUIDGenerator. All IDs generated will be forwarded to the Listener before returning.

Auditing Backend

Given high writes, low reads, and no transactional requirements, we propose to build a Listener client that sends the IDs over the network to a Pub-sub system like Kafka for asynchronous processing. The Listener will simply publish the string-representation of the unique ID to a particular topic, then consumers can route the messages to a data warehouse (like Snowflake, HBase,...) to allow for easy analysis and querying.

Any UUIDGenerator client with a unique machineAddress can publish to a single Kafka topic and be guaranteed that all published IDs are unique in the topic.

Batching

If there is some tolerance for latency and dropped IDs (in the case of machine/network failure), then the local Kafka client should be configured to batch messages instead of sending every ID to improve throughput of the backend.

Moving off the critical path

As we want the UUIDGenerator::generate method to be fast, we do not want to block on the auditing Listener. There's two choices here:

Run the listener in its own thread to allow the listener to handle its response without blocking the return.
Offload the problem to the client and allow the client to perform its own multi-threading approaches.

Our library has adopted approach 2 for the time being, which we believe is the least-surprising implementation strategy.

jackdreilly / uuid-generator