Part 2: Designing a Better String Utility
In Part 1, we saw that everyday string manipulation is surprisingly brittle. So let’s try designing something better—from scratch.
🔍 Observation
Almost every string operation we need can be broken down into a few orthogonal concerns:
1. Where is the substring?
The number of choices is manageable:
- the first
"."
(think ofString.indexOf()
) - the last
"/"
(String.lastIndexOf()
) - the prefix
"root/"
- the suffix
".html"
- a regular expression match
2. How do we navigate from the match?
Once we find it, we might want the:
- part before it
- part after it
- substring up to the end of the match
- substring from the match to the end of the string
- the content between two patterns
3. Do we want a occurrence, or all?
Sometimes we just want the first. Other times, we want to find all occurrences.
4. What do we want to do with it?
- Extract
- Remove
- Replace
- Split around the occurrence (result in two parts)
- Split around all occurrences (result in a stream or list)
- Or just get the raw indices
📐 Step 1: The Substring as a Range
No matter how we find it, a substring is just a range: [begin, end)
. This is exactly how String.substring(int, int)
works.
So let’s create a simple class to model it:
record Substring(String fullString, int begin, int end) {
String before() { return fullString().substring(0, begin()); }
String after() { return fullString().substring(end()); }
String remove() { return before() + after(); }
String replaceWith(String replacement) {
return before() + replacement + after();
}
@Override String toString() {
return fullString().substring(begin(), end());
}
}
Already, we can add some trivial but useful before()
, after()
, remove()
, and replaceWith()
helpers.
Note: while the record looks extremely simple, it's the most important aspect of this design: all the extensibility and composability are centered around this simple concept: they are strategies to find, to navigate and to act on a range.
🧭 Step 2: Finding a Substring
We need an abstraction for “a way to find a Substring in a given input.” Let's call it Pattern
:
interface Pattern {
Optional<Substring> find(String input, int fromIndex);
static Pattern first(String sub) {
return (input, fromIndex) -> {
int index = input.indexOf(sub, fromIndex);
return index < 0
? Optional.empty()
: Optional.of(new Substring(input, index, index + sub.length()));
};
}
static Pattern last(String sub) {
// almost same as first(), except use lastIndexOf()
}
static Pattern prefix(String s) {
return (input, fromIndex) -> {
return input.startsWith(s, fromIndex)
? Optional.of(new Substring(input, fromIndex, fromIndex + s.length()))
: Optional.empty();
};
}
static Pattern suffix(String s) {
// very similar to prefix()
}
}
That's it. The static methods such as first()
, last()
, prefix()
etc. correspond to the first observation (where to find).
The find() method returns an Optional, and this design principle applies across the API: all substring extraction methods return Optional. This ensures that absence is explicit and must be handled consciously, avoiding the kind of implicit fallbacks or silent failures seen in methods like substringAfter() and substringBefore().
We also let the find()
method take a secondary int fromIndex
parameter to facilitate chaining, which you’ll see shortly.
🧭 Step 3: Using the Found Substring
As we've seen, there are many things we might want to do once we've found a substring.
The Substring
record provides the index and some basic helper methods like remove()
and replaceWith()
. But we can also add higher-level methods to Pattern
to directly perform common operations like extraction, removal, replacement, and splitting—without requiring the caller to explicitly invoke find()
.
default Optional<String> from(String input) {
return find(input, 0).map(Substring::toString);
}
default String removeFrom(String input) {
return find(input, 0).map(Substring::remove).orElse(input);
}
default String replaceFrom(String input, String replacement) {
return find(input, 0)
.map(sub -> sub.replaceWith(replacement))
.orElse(input);
}
default <T> Optional<T> split(
String input, BiFunction<String, String, T> fn) {
return find(input, 0)
.map(sub -> fn.apply(sub.before(), sub.after()));
}
These helpers let us perform removal, replacement, and splitting on any of the first()
, last()
, prefix()
, or suffix()
patterns in a consistent and predictable way.
The return value of from()
is Optional
to force the caller to explicitly handle missing matches—avoiding assumptions like those made by substringAfter()
.
The removeFrom()
and replaceFrom()
methods don’t return Optional
because, idiomatically, removal is a no-op if the pattern isn’t found (just like Collection.remove()
).
🧭 Step 4: Composition and Navigation
We’re still missing an important part of the API: how to express operations like before(first(foo))
, after(last(bar))
, or between(a, b)
.
We can build this via composition. This is also where the int fromIndex
parameter to find()
becomes useful:
static Pattern before(Pattern p) {
return (input, fromIndex) -> p.find(input, fromIndex)
.map(sub -> new Substring(input, fromIndex, sub.begin()));
}
static Pattern after(Pattern p) {
return (input, fromIndex) -> p.find(input, fromIndex)
.map(sub -> new Substring(input, sub.end(), input.length()));
}
static Pattern between(Pattern a, Pattern b) {
return (input, fromIndex) -> {
return a.find(input, fromIndex)
.flatMap(left ->
b.find(input, left.end())
.map(right ->
new Substring(input, left.end(), right.begin())
));
};
}
static Pattern between(String a, String b) {
return between(first(a), first(b));
}
The before(p)
pattern applies p
and returns everything to the left; after(p)
returns everything to the right.
The between(a, b)
pattern first applies a
, then applies b
starting from the end of a
.
You can use the same strategy to build other navigational patterns like upToIncluding(...)
.
The overload between(String, String)
is just a convenience method for common cases like between("(", ")")
.
✅ How Are We Doing?
Let’s revisit the Part 1 use cases.
We can now express many earlier problems using simple, composable code:
Extracting the extension name:
String ext = after(last(".")).from("myfile").orElse("");
Unlike substringAfter()
, the fallback logic is explicit—no room for error.
Getting the directory:
String dir = before(last('/')).from("myfile").orElse(""); // → ""
Again, it forces you to be explicit—you cannot forget.
Getting directory path between root and last "/":
Pattern path = between(prefix("home/usr/"), last("/"));
String result = path.from("home/usr/module/component/file.txt").orElse("");
// → "module/component"
Split a key-value pair
Pattern eq = Pattern.first("=");
Optional<Pair<String, String>> kv = first("=").split("k1=v1", Pair::new);
// → Optional[Pair("k1", "v1")]
Optional<Pair<String, String>> failed = first("=").split("k1", Pair::new);
// → Optional.empty()
🔁 Step 5: Repeating Pattern
Sometimes we want to apply the same pattern repeatedly. That’s where RepeatingPattern
comes in:
record RepeatingPattern(Pattern base) {
List<Substring> findAll(String input) {
List<Substring> result = new ArrayList<>();
int start = 0;
while (true) {
Substring sub = base().find(input, start).orElse(null);
if (sub == null) break;
result.add(sub);
start = sub.end();
}
return result;
}
String replaceAllFrom(String input, String replacement) {
StringBuilder result = new StringBuilder();
int cursor = 0;
for (Substring sub : findAll(input)) {
result.append(input, cursor, sub.begin());
result.append(replacement);
cursor = sub.end();
}
result.append(input.substring(cursor));
return result.toString();
}
String removeAllFrom(String input) {
return replaceAllFrom(input, "");
}
List<String> split(String input) {...}
}
With pretty straightforward implementation code, we've added useful "acting" functionalities for all the previously defined substring patterns.
You can also implement the n-way split()
using the results from findAll()
. We omit it here for brevity.
For ergonomic chaining, let’s add a repeatedly()
helper to Pattern
:
RepeatingPattern repeatedly() { return new RepeatingPattern(this); }
✅ What Have We Accomplished?
We've built a library with pretty simple API, yet with sufficient flexibility to cover diverse everyday String use cases.
This is possible because we've decoupled orthogonal concerns:
-
first()
,last()
,prefix()
,suffix()
and friends do one thing only: find the range. -
before()
,after()
,between()
,upToIncluding()
,toEnd()
do one thing only: compose from simpler patterns to more sophisticated patterns. -
.from(String)
,.split(String)
,.replaceFrom(String)
etc. do one thing only: given any simple or composite pattern, perform extraction, splitting, replacing from the range. -
.repeatedly()
turns any pattern into aRepeatingPattern
, and then provide operations on all occurrences of a pattern.
Each individual family of methods are relatively straightforward, but combined together, they can perform far more sophisticated string operation.
Below, let's take a look at how we can use a combination to solve a more complex problem.
Split multiple key-values from a structured map string
This example shows how to compose Pattern
and RepeatingPattern
to achieve a relatively complex task:
String input = "{k1=v1, k2=v2, k3=v3}";
String content = Pattern.between("{", "}").from(input).orElseThrow();
Map<String, String> result = first(",").repeatedly()
.split(content)
.stream()
.map(entry -> Pattern.first("=").split(entry.trim(), Map::entry).orElseThrow())
.collect(Collectors.toMap(Entry::getKey, Entry::getValue));
// → {k1=v1, k2=v2, k3=v3}
Here’s what’s happening:
- We use
between("{", "}")
to extract the content. - Then we split it into key-value strings using
first(",").repeatedly()
. - Finally, each key-value string is split again by
=
into aMap.Entry
.
This API lets you build robust and extremely flexible string manipulation logic without worrying about magic offsets, nulls, or off-by-one errors—all with predictable failure behavior.
We've covered the basics of the Substring
API design.
The actual library is extensively used in Google's internal codebase and includes many more features such as look-around support (.immediatelyBetween(...)
, followedBy(...)
, etc.).
Check out Git Repo.
Top comments (0)