The UUIDGenerator
class creates instances which can generate guaranteed unique
IDs, assuming no two instances share the same machineAddress
. A standard
approach to disambiguate instances (used as the default setting) is to use the
MAC address, and guarantee that all MAC addresses are unique across the set of
instances of this class in use by your system.
The strategy followed is largely in line with that employed by UUID V1, which is considered a time-based UUID strategy using the same (timestamp, machine name, sequence number) data.
We do not follow the same string format or 128-bit internal representation that UUID V1 employs. This was done for simplicity, but would be a useful optimization on this approach if space efficiency and standardized string-representation length was important.
The Comparable
implementation of UniqueId
sorts first on timestamp
, then
machine address
, then sequence number
, thus we have the guarantee that if
idA < idB
, then timestampA < timestampB
, i.e., assuming clocks are
synchronized, then idA
was generated before idB
.
import org.example.UUIDGenerator;
class Example {
public static void main(String[] args) {
var kafkaClient = new KafkaClient();
var generator = UUIDGenerator.builder().listener(i -> kafkaClient.publishMessage("uuid-audit-log",
i.toString())).build();
var id1 = generator.generate();
var id2 = generator.generate();
assert !id1.equals(id2);
}
static class KafkaClient {
void publishMessage(String topic, String message) {
System.out.printf("Sent %s to topic %s%n", message, topic);
}
}
}
This class is not threadsafe. Additionally, if the fields were transactionally
sound, you would want to make sure that machineAddress
was unique across
instances instantiated on the different threads. We do not solve this issue at
the moment, but an additional field could be used to track thread ID's to
uniqueify IDs across threads.
To allow auditing of the IDs generated by this system, we allow passing a
Listener
to UUIDGenerator
. All IDs generated will be forwarded to the
Listener
before returning.
Given high writes, low reads, and no transactional requirements, we propose to
build a Listener
client that sends the IDs over the network to a Pub-sub
system like Kafka for asynchronous processing. The Listener
will simply
publish the string-representation of the unique ID to a particular topic, then
consumers can route the messages to a data warehouse (like Snowflake, HBase,...)
to allow for easy analysis and querying.
Any UUIDGenerator
client with a unique machineAddress
can publish to a
single Kafka topic and be guaranteed that all published IDs are unique in the
topic.
If there is some tolerance for latency and dropped IDs (in the case of machine/network failure), then the local Kafka client should be configured to batch messages instead of sending every ID to improve throughput of the backend.
As we want the UUIDGenerator::generate
method to be fast, we do not want to
block on the auditing Listener
. There's two choices here:
- Run the listener in its own thread to allow the listener to handle its response without blocking the return.
- Offload the problem to the client and allow the client to perform its own multi-threading approaches.
Our library has adopted approach 2 for the time being, which we believe is the least-surprising implementation strategy.