An Attempt at a More Useful UUID
Late yesterday I was talking with my manager about the use of the endpoint-based UUID in the communication with the Broker. In the original implementation, we used a random 128-bit UUID with the system call:
#include <uuid/uuid.h> void UUID::fill() { uuid_generate(uint8_t *)mBlocks); }
to populate the ivar data that was very simply:
private: uint64_t mBlocks[2];
and it worked fine, but the point was raised: Can't we make this more useful? and so we thought about packing the TCP socket endpoint data (address and port) as well as a sequence number into the same 128-bits. The data would look something like this where the MSB byte is byte 0 and the LSB is byte 15:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
Local Addr | Port | Remote Addr | Port | Counter |
The code for generating this is a little tricky, as it needs to be populated most naturally in network byte order, but the machine needs it eventually in host byte order. So I came up with the idea of creating hton() and ntoh() for the uint128_t values. All was working as we had planned. The endpoints were being properly encoded into the UUID and the counter attempted to keep them unique.
But there was a nasty truth about the ephemeral ports used - they would be pooled and re-used by the application if it restarted. Likewise, the "sequence numbers" would too. Unfortunately, this made it possible to run the application several times in succession and get the exact same UUIDs based on the endpoints. Not good. Not horrible, but the big point was to make it easy in the logs to see who was connecting to whom and all. For that, it was a failure.
Add to that, the fact that with the random UUID we could create the UUID before the socket connection was established. With the endpoint-based solution, we had to wait until after. This made the code a lot more complex, and I'm not a big fan of complexity.
So we decided that it was probably better to forego the endpoint-based UUIDs and just stick with the random ones. It's not ideal, but it's actually better than the possible disinformation that the endpoint-based UUIDs might have brought.