Unique Identity (UID)


How can we obtain a unique identifier which is unique across the various JVMs? This article tries to understand the various Unique identification components that can help us generate a hash code which might be a practically unique number.

The components are:

  1. IP address
  2. Process ID
  3. Sequence number
  4. Current time in seconds

We will discuss each component. Objective is to convert the component into a long value so that it can participate in hashCode generation.

  1. IP Address: Here we will discuss how an IP address can be converted to a long value. An Internet Protocol address (IP address) is a numerical label assigned to each device. Usually it is IPv4 which is a 32 bit number. However, due to the enormous growth of the Internet and the predicted depletion of available addresses, a new addressing system (IPv6) is developed, using 128 bits for the address. Since the machine’s IP address may be represented in IPv6 or IPv4, we will have to assign space assuming it is IPv6 which means we would need 16 bytes to work with. If it is IPv4 we will convert into IPv6 for example ::FFFF: The first 10 bytes would be assigned 0, 11th and 12th byte would be FF, FF, the next 4 bytes would be the IPv4 address. Each byte needs to be converted into its unsigned value and will be assigned the lower 8 bits so the existing lower 8 bits will have to be left shifted by 8 to give way to the current byte.
  2. Process ID:  Process ID is a unique number between processes in the same machine.It can be obtained in different ways. We will discuss each method here.
  • Obtains a unique value to represent the process id via UUID. In Java, we can obtain it using UUID.randomUUID().
  • Obtains a unique value to represent the process id via the file system. Idea is to create a tmp file with a file name that contains a unique pid number. If the file already exists it  should increment the pid and try again to create it. There can be a fixed number of retrials. The current system time can be used to obtain the base pid. The path would then be {tmp path}/pid{time in ms}. For example if tmp path is C:test, current time is 1330921270239, file name in tmp dir would be C:testtmppid1330921270239.  The check for the existence of the file and the creation of the file, if it does not exist, must happen in a single operation that is atomic with respect to all other filesystem activities that might affect the file. When JVM terminates, the file created should automatically get deleted.
  • Obtains a unique value to represent the process id via sockets and ports. Since no two ServerSockets (JVM or otherwise) can exist on the same port they will all be unique. The port number used to create the ServerSocket will become the pid. The ServerSocket has to be a static Object so that there will be only one per JVM.

The third and fourth are obvious so we won’t discuss about them any more. The next main thing is to convert these components to a hashCode. The easiest one would be to apply XOR on each component.

hashCode = (int) hostAddr[0] ^ (int) hostAddr[1] ^ process^ sec ^ seq;

The most obvious problem with the “XOR” all fields approach to hash codes is that any two fields XOR’d together will give you the same value, regardless of the order in which they are XOR’d (i.e. XOR is a commutative operation). So y^x^n = ^x^n^y. This increases the chance that you will get hash code collision, which of course is a bad thing. We should be choosing an algorithm that is non-commutative, and secondly we should be choosing an algorithm that uses the full range of integer-space, rather than limiting a hashcode to the range of its constituent members. Hash code distribution is a complicated topic and has to be dealt separately.



Leave A Reply