DEV Community

Dash One
Dash One

Posted on • Edited on

Designing a Better String Utility - Part 1

Simple Things Not Easy

Finding a substring is the most basic text operation in any language. And yet in Java, even Kotlin, is surprisingly awkward and error-prone.

Consider the following example that finds the extension name of a file:

val ext = "myfile".substringAfter(".")
Enter fullscreen mode Exit fullscreen mode

The file actually has no extension name, but the code returns myfile.

That's odd?!

You could explicitly choose a different default behavior. But that's the point:, the default behavior is confusing and it will bite you!

What makes things worse, is if you are used to Apache's StringUtils, you may have expected differently:

String ext = StringUtils.substringAfter("myfile", ".");
Enter fullscreen mode Exit fullscreen mode

Now it says the extension name is empty. Seems to make more sense?

So, should Kotlin's String API have followed StringUtils?

Not quite. Let's study the following StringUtils example:


🔥 Extracting the Directory Path

String directory = StringUtils.substringBeforeLast(filename, "/");
Enter fullscreen mode Exit fullscreen mode

If filename is "path/to/file", the code will extract "path/to" as the directory.

But what happens if filename is merely "myfile"? Interestingly, StringUtils decides to use a different strategy for the "not found" case: to fall back to the original string, which is "myfile".

But "myfile" is not the directory path!

At least Kotlin is consistent with itself to always fall back to the original string:

val directory = filename.substringBeforeLast("/") // → "myfile"
Enter fullscreen mode Exit fullscreen mode

When we call an extraction method, we're asking for something specific. If the thing doesn't exist, quietly pretending it does is misleading—and confusing.

Worse, the fallback behavior is inconsistent across methods:

  • substringBefore() → returns the original string
  • substringAfter() → returns an empty string
  • substringBetween() → returns null

Same type of failure. Three different, silent error results.


🔥 Extracting Path Under Root

Given file path "home/usr/path/to/file.txt", we want to extract the "path/to" part after the known "home/usr/" prefix:

Kotlin:

val directory =
    path.removePrefix("home/usr/").substringBeforeLast("/")
Enter fullscreen mode Exit fullscreen mode

Apache Commons:

String directory = StringUtils.substringBeforeLast(
    StringUtils.removeStart(path, "root/"),
    "/");
Enter fullscreen mode Exit fullscreen mode

Again, the happy path seems fine. But what happens if path doesn't start with /home/usr/? Or if it doesn't have another / after the prefix? Say, it's just myfile?

As discussed earlier, the "path under root" then becomes "myfile". Not intuitive, and readers will likely need to look up the document to tell.

Let's look at a few other real world use cases.

🔥 Splitting a Key-Value Pair

Given the following HTTP line:

String line = "Authorization: Bearer token";
Enter fullscreen mode Exit fullscreen mode

How do you get the header name and value?

Kotlin:

val parts = line.split(":", limit = 2)
val key = parts[0]          // → "Authorization"
val value = parts.getOrNull(1)?.trim() ?: ""  // → "Bearer token"
Enter fullscreen mode Exit fullscreen mode

Apache Commons:

String[] parts = StringUtils.split(line, ":", 2);
String key = parts.length > 0 ? parts[0] : "";           // → "Authorization"
String value = parts.length > 1 ? parts[1].trim() : "";  // → "Bearer token"
Enter fullscreen mode Exit fullscreen mode

Note that if you don't carefully check the result array's length, you will run into exceptions when the input string doesn't include the colon (:) character.


🔥 Parsing Multiple Key-Value Pairs

What if I want to parse a dictionary-like input string such as the following?

String input = "{k1=v1, k2=v2, k3=kv}";
Enter fullscreen mode Exit fullscreen mode

Kotlin:

val raw = input.removeSurrounding("{", "}")
val map = raw.split(",").mapNotNull {
    val kv = it.split("=", limit = 2)
    if (kv.size == 2) kv[0].trim() to kv[1].trim() else null
}.toMap()
Enter fullscreen mode Exit fullscreen mode

Apache Commons:

String trimmed = StringUtils.removeStart(input, "{");
trimmed = StringUtils.removeEnd(trimmed, "}");
String[] entries = StringUtils.split(trimmed, ",");
Map<String, String> map = new LinkedHashMap<>();
for (String entry : entries) {
    String[] kv = StringUtils.split(entry, "=");
    if (kv.length == 2) {
        map.put(StringUtils.trim(kv[0]), StringUtils.trim(kv[1]));
    }
}
Enter fullscreen mode Exit fullscreen mode

Neither the Kotlin code or the Apache StringUtils code is convenient or easy to read.


Why Is It So Hard?

1. Error Handling

One-off util methods tend to be arbitrary in their error handling strategy, as discussed above.

It's tempting to pick a seemingly okay default value in order to keep the method signature "simple", at the cost of potential confusions.

And with arbitrary choice of default values, it can easily get inconsistent, making it even harder for people to remember.


2. One Method Per Use Case? That Doesn’t Scale

Every real-world string task feels slightly different.

But writing a separate method for each case quickly gets out of hand.

That’s why Apache Commons StringUtils has over 200 public static methods, including:

  • splitByWholeSeparatorPreserveAllTokens(...)
  • splitByCharacterTypeCamelCase(...)

These method names are overly long, to the point that you might not be able to intuitively get what they do. They are narrowly scoped to a specific use case that you have to carefully read the javadoc to understand.

They are rarely composable and their edge cases can sometimes be surprising.


3. The Static Util Class Trap

StringUtils mixes up two separate concerns:

  • How to find something in a string
  • What to do with it once found

Instead of composition, you get a mess of methods with long names and adhoc behavior. It's why util classes are often considered an anti-pattern.

What we really need is a way to model finding and acting separately.

In Part 2, we’ll try to build around a simple abstraction. And we'll see if a different approach can result in better coverage, with a simpler API.

Top comments (0)