13

I'm confused with a code

public class StringReplaceWithEmptyString 
{
    public static void main(String[] args) 
    {
        String s1 = "asdfgh";
        System.out.println(s1);
        s1 = s1.replace("", "1");
        System.out.println(s1); 
    }
}

And the output is:

asdfgh
1a1s1d1f1g1h1

So my first opinion was every character in a String is having an empty String "" at both sides. But if that's the case after 'a' (in the String) there should be two '1' coming in the second line of output (one for end of 'a' and second for starting of 's').

Now I checked whether the String is represented as a char[] in these links In Java, is a String an array of chars? and String representation in Java I got answer as YES.

So I tried to assign an empty character '' to a char variable, but its giving me a compiler error,

Invalid character constant

The same process gives a compiler error when I tried in char[]

char[] c = {'','a','','s'};  // CTE

So I'm confused about three things.

  1. How an empty String is represented by char[] ?
  2. Why I'm getting that output for the above code?
  3. How the String s1 is represented in char[] when it is initialized first time?

Sorry if I'm wrong at any part of my question.

12
  • 3
    I would have expected this from String#replaceAll but not from String#replace Commented Jan 27, 2017 at 7:33
  • 2
    @TimBiegeleisen just tested and it actually produces that result with just String#replace Commented Jan 27, 2017 at 7:36
  • 2
    "how an empty String is represented" - char[] empty = {}; Commented Jan 27, 2017 at 7:36
  • 7
    IMO asking how many empty Strings are in a String is sort of like dividing by zero. Commented Jan 27, 2017 at 7:45
  • 3
    I'm surprised that it inserts exactly one 1 rather than zero. After all, "ab" == "a" + "" + "b" == "a" + "" + "" + "b" etc; so saying that there is 1 empty string between them seems... arbitrary. Commented Jan 27, 2017 at 8:41

3 Answers 3

7

Just adding some more explanation to Tim Biegeleisen answer.

As of Java 8, The code of replace method in java.lang.String class is

public String replace(CharSequence target, CharSequence replacement) {
        return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
                this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}

Here You can clearly see that the string is replaced by Regex Pattern matcher and in regex "" is identified by Zero-Length character and it is present around any Non-Zero length character.

So, behind the scene your code is executed as following

Pattern.compile("".toString(), Pattern.LITERAL).matcher("asdfgh").replaceAll(Matcher.quoteReplacement("1".toString()));

The the output becomes

1a1s1d1f1g1h1
Sign up to request clarification or add additional context in comments.

Comments

6

Going with Andy Turner's great comment, your call to String#replace() is actually implemented using String#replaceAll(). As such, there is a regex replacement happening here. The matches occurs before the first character, in between each character in the string, and after the last character.

^|a|s|d|f|g|h|$
 ^ this and every pipe matches to empty string ""

The match you are making is a zero length match. In Java's regex implementation used in String.replaceAll(), this behaves as the example above shows, namely matching each inter-character position and the positions before the first and after the last characters.

Here is a reference which discusses zero length matches in more detail: http://www.regexguru.com/2008/04/watch-out-for-zero-length-matches/

A zero-width or zero-length match is a regular expression match that does not match any characters. It matches only a position in the string. E.g. the regex \b matches between the 1 and , in 1,2.

9 Comments

I think we all can see that, but the question is why does it happen? What would be the purpose of having an empty String in between the characters? Or perhaps this is a fault of String#replace() method?
So @Tim Biegeleisen what will be the char[] representation of String s1, when it is first initialized
But isn't the method in subject (String.replace(CharSequence, CharSequence)) actually supposed to do the match without using regexes, according to its JavaDoc ? The replaceAll method does use regexes and it states so in its JavaDoc, but replace(CharSequence, CharSequence) doesn't mention regexes anywhere, it actually says Replaces each substring of this string that matches the literal target sequence... . That's what leaves me confused.
@SantiBailors Yes, this is what I thought too. Andy Turner, who is a Java architect at Google, seems to think that replaceAll is being used under the hood. In any case, the behavior is of a zero length regex replacement.
^$ is saying the entire string has nothing in it, hence empty string alone matches. Searching for "" unbounded means find boundaries between every character in the string. Makes sense?
|
2

This is because it does a regex match of the pattern/replacement you pass to the replace().

 public String replace(CharSequence target, CharSequence replacement) {
    return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
     this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
 }

Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence. The replacement proceeds from the beginning of the string to the end, for example, replacing "aa" with "b" in the string "aaa" will result in "ba" rather than "ab".

Parameters:

target The sequence of char values to be replaced

replacement The replacement sequence of char values

Returns: The resulting string

Throws: NullPointerException if target or replacement is null.

Since: 1.5

Please read more at the link below ... (Also browse through the source code).

http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/lang/String.java#String.replace%28java.lang.CharSequence%2Cjava.lang.CharSequence%29

A regex such as "" would match every possible empty string in a string. In this case it happens to be every empty space at the start and end and after every character in the string.

3 Comments

Which method does the doc in your quote refer to (Splits this string around matches of the given regular expression...) ?
Sorry, wrong (hasty) copy and paste. Correcting.
Thanks, so now it's the JavaDoc of replace(CharSequence, CharSequence). This highlights the problem (in my opinion): the doc of the method does not mention matching with regex at all, while the doc of replaceAll does.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.