1. Overview
In this quick article, we’ll explore the possibilities to generate a unique Integer from a unique String. While Java offers several ways to achieve this, each approach balances speed, simplicity, and uniqueness differently.
2. What Does Unique Mean?
Uniqueness means a distinct String maps to a distinct int, ideally with no collisions. However, since int has only 2^32 possible values, collisions are possible when hashing very many strings.
Uniqueness isn’t binary – methods like hashCode() offer probabilistic uniqueness with rare collisions, while lookup maps guarantee it.
Each solution should consider how big our input space is and how many collisions in the output may be allowed, if any.
2.1. Validation
To ensure our implementations behave as expected, we test them for uniqueness using a parameterized JUnit test:
private static Stream<Arguments> implementations() {
return Stream.of(Arguments.of(Named.<Function<String, Integer>> of("toIntByHashCode", StringToUniqueInt::toIntByHashCode)),
Arguments.of(Named.<Function<String, Integer>> of("toIntByCR32", StringToUniqueInt::toIntByCR32)),
Arguments.of(Named.<Function<String, Integer>> of("toIntByCharFormula", StringToUniqueInt::toIntByCharFormula)),
Arguments.of(Named.<Function<String, Integer>> of("toIntByMD5", StringToUniqueInt::toIntByMD5)),
Arguments.of(Named.<Function<String, Integer>> of("toIntByLookup", StringToUniqueInt::toIntByLookup))
);
}
@ParameterizedTest
@MethodSource("implementations")
public void given1kElements_whenMappedToInt_thenItShouldHaveNoDuplicates(Function<String, Integer> implementation) {
Stream<String> strings = uniqueStringsOfSize(1_000); // may be increased for better guarantees
List<Integer> integers = strings.map(implementation)
.toList();
assertThat(integers).doesNotHaveDuplicates();
}
For each implementation we provide, the test generates a large Set of unique Strings and maps each value to an Integer. The assertion checks whether there are any duplicates in the result List, validating the effectiveness of the solution.
3. Solutions
Let’s dive into five practical approaches to generate a unique int from a String.
3.1. Using String.hashCode()
Our first solution is maybe the most obvious one, using hashCode():
public static int toIntByHashCode(String value) {
return value.hashCode();
}
It’s fast and built into Java, making it ideal for quick caching or non-critical applications. However, String.hashCode() isn’t collision-free, as multiple strings can produce the same int.
We can use this when speed matters more than guaranteed uniqueness.
3.2. Using Chars With Equation
For more control, we can craft a custom formula applied to each character:
public static int toIntByCharFormula(String value) {
return value.chars()
.reduce(17, (a, b) -> a * 13 + (b / (a + 1))); // or any other equation
}
It’s simple and customizable but prone to collisions, similar to hashCode(). It’s suitable for educational purposes or when we need a tailored hash function, but we should test thoroughly for collision risks.
3.3. Using CRC32 for Checksums
The third, and robust approach, is the CRC32 checksum from java.util.zip:
public static int toIntByCR32(String value) {
CRC32 crc32 = new CRC32();
crc32.update(value.getBytes());
return (int) crc32.getValue();
}
CRC32 processes the string’s bytes to produce a 32-bit checksum, which is cast to an int. It’s designed for error detection, offering a lower collision probability than hashCode().
While slower, it’s reliable for applications like file indexing or data integrity checks where robustness is key.
3.4. Using MD5 With Byte Shift
For a cryptographic approach, we use MD5 hashing:
public static int toIntByMD5(String value) {
try {
MessageDigest digest = MessageDigest.getInstance("MD5");
byte[] hash = digest.digest(value.getBytes());
return ((hash[0] & 0xFF) << 24) | ((hash[1] & 0xFF) << 16)
| ((hash[2] & 0xFF) << 8) | (hash[3] & 0xFF);
} catch (NoSuchAlgorithmException e) {
throw new RuntimeException("MD5 not supported", e);
}
}
MD5 generates a 128-bit hash, from which we extract the first four bytes to form a 32-bit int using bitwise operations.
It’s slower but has a very low collision risk, making it suitable for high-reliability scenarios like unique key generation. However, it may be overkill for simple use cases.
3.5. Using Lookup
Our last approach is useful when uniqueness must be guaranteed.
We use a HashMap as a simple persistence layer to store generated integers and their strings:
private static final Map<String, Integer> lookupMap = new HashMap<>();
private static final AtomicInteger counter = new AtomicInteger(Integer.MIN_VALUE);
The implementation in this case is very straightforward – either return an already generated int for a specific String, or increment counter for a new Integer and store it:
public static int toIntByLookup(String value) {
var found = lookupMap.get(value);
if (found != null) {
return found;
}
var intValue = counter.incrementAndGet();
lookupMap.put(value, intValue);
return intValue;
}
It guarantees uniqueness by using an AtomicInteger counter, ideal for database keys or persistent identifiers. The trade-off is memory usage, which grows with the number of strings.
4. Conclusion
There are multiple approaches for generating a unique int from a String in Java, each with distinct trade-offs.
hashCode() and custom formulas are fast but risk collisions, suitable for caching. CRC32 and MD5 provide robust, low-collision options for reliable indexing. The lookup map ensures uniqueness at the cost of memory, perfect for critical applications. We should choose based on our needs for speed, reliability, or scalability.
As always, the entire code used in this article can be found over on GitHub.