82

Using Java, I want to go through the lines of a text and replace all ampersand symbols (&) with the XML entity reference &.

I scan the lines of the text and then each word in the text with the Scanner class. Then I use the CharacterIterator to iterate over each characters of the word. However, how can I replace the character? First, Strings are immutable objects. Second, I want to replace a character (&) with several characters(amp&;). How should I approach this?

CharacterIterator it = new StringCharacterIterator(token);
for(char ch = it.first(); ch != CharacterIterator.DONE; ch = it.next()) {
       if(ch == '&') {

       }
}
0

11 Answers 11

142

Try using String.replace() or String.replaceAll() instead.

String my_new_str = my_str.replace("&", "&");

(Both replace all occurrences; replaceAll allows use of regex.)

Sign up to request clarification or add additional context in comments.

4 Comments

Be careful with replaceAll, because it uses its first argument as regular expression. I.e. "h.e.l.l.o".replaceAll(".", ",") will give you ",,,,,,,,,"! In Java 1.5 there is new String.replace(CharSequence, CharSequence) method, which does something similar, but doesn't interpret first argument as regular expression.
@PeterŠtibraný Or... you could just escape the character you want to replace : replaceAll("[.]", ",")
this is not how you would escape a character. I think peter's point is that using regex when you dont need to has potential for unintended side effects.
Just a side note: We can also use %26 instead of &amp. Looks like in some rest call %26 works rather than &amp.
93

The simple answer is:

token = token.replace("&", "&");

Despite the name as compared to replaceAll, replace does do a replaceAll, it just doesn't use a regular expression, which seems to be in order here (both from a performance and a good practice perspective - don't use regular expressions by accident as they have special character requirements which you won't be paying attention to).

Sean Bright's answer is probably as good as is worth thinking about from a performance perspective absent some further target requirement on performance and performance testing, if you already know this code is a hot spot for performance, if that is where your question is coming from. It certainly doesn't deserve the downvotes. Just use StringBuilder instead of StringBuffer unless you need the synchronization.

That being said, there is a somewhat deeper potential problem here. Escaping characters is a known problem which lots of libraries out there address. You may want to consider wrapping the data in a CDATA section in the XML, or you may prefer to use an XML library (including the one that comes with the JDK now) to actually generate the XML properly (so that it will handle the encoding).

Apache also has an escaping library as part of Commons Lang.

Comments

16
StringBuilder s = new StringBuilder(token.length());

CharacterIterator it = new StringCharacterIterator(token);
for (char ch = it.first(); ch != CharacterIterator.DONE; ch = it.next()) {
    switch (ch) {
        case '&':
            s.append("&");
            break;
        case '<':
            s.append("&lt;");
            break;
        case '>':
            s.append("&gt;");
            break;
        default:
            s.append(ch);
            break;
    }
}

token = s.toString();

10 Comments

Using a String instead would result in the creation of a temporary String object per iteration. I'm not sure what alternative you would suggest.
+1: Not sure why this received 2 downvotes - It's likely to be far more efficient than replaceAll() - After all why use regular expressions when simply matching on a single character?
Further to my previous comment, I just measured the performance of replaceAll and Sean's solution against a 5000 character String where approximately 10% of characters are '&' - The average replaceAll time is 0.92ms while Sean's solution is 0.29ms. Using a StringBuilder improves the time further to 0.23ms.
@Adamski - I was just going to do that performance test myself. Thanks for doing the leg work for me!
It wasn't premature optimization - it was my answer to the question. It just also happens to faster than String.replaceAll(), but that wasn't the reason for suggesting it.
|
10

You may also want to check to make sure your not replacing an occurrence that has already been replaced. You can use a regular expression with negative lookahead to do this.

For example:

String str = "sdasdasa&amp;adas&dasdasa";  
str = str.replaceAll("&(?!amp;)", "&amp;");

This would result in the string "sdasdasa&amp;adas&amp;dasdasa".

The regex pattern "&(?!amp;)" basically says: Match any occurrence of '&' that is not followed by 'amp;'.

Comments

6

Just create a string that contains all of the data in question and then use String.replaceAll() like below.

String result = yourString.replaceAll("&", "&amp;");

2 Comments

If the data is too large, creating a single string consisting of all of the data may be disadvantageous. We can do line-by-line as well.
Using replaceAll in this case is WRONG! If possible, always use replace instead of replaceAll. It is more efficient and less error prone.
3

You can use stream and flatMap to map & to &amp;

    String str = "begin&end";
    String newString = str.chars()
        .flatMap(ch -> (ch == '&') ? "&amp;".chars() : IntStream.of(ch))
        .collect(StringBuilder::new, StringBuilder::appendCodePoint, StringBuilder::append)
        .toString();

Comments

1

Escaping strings can be tricky - especially if you want to take unicode into account. I suppose XML is one of the simpler formats/languages to escape but still. I would recommend taking a look at the StringEscapeUtils class in Apache Commons Lang, and its handy escapeXml method.

Comments

1

Try this code.You can replace any character with another given character. Here I tried to replace the letter 'a' with "-" character for the give string "abcdeaa"

OutPut -->_bcdef__

    public class Replace {

    public static void replaceChar(String str,String target){
        String result = str.replaceAll(target, "_");
        System.out.println(result);
    }

    public static void main(String[] args) {
        replaceChar("abcdefaa","a");
    }

}

Comments

0

If you're using Spring you can simply call HtmlUtils.htmlEscape(String input) which will handle the '&' to '&' translation.

1 Comment

That is risky because HTML has many more entities defined than pure XML.
0
//I think this will work, you don't have to replace on the even, it's just an example. 

 public void emphasize(String phrase, char ch)
    {
        char phraseArray[] = phrase.toCharArray(); 
        for(int i=0; i< phrase.length(); i++)
        {
            if(i%2==0)// even number
            {
                String value = Character.toString(phraseArray[i]); 
                value = value.replace(value,"*"); 
                phraseArray[i] = value.charAt(0);
            }
        }
    }

Comments

-2
String taskLatLng = task.getTask_latlng().replaceAll( "\\(","").replaceAll("\\)","").replaceAll("lat/lng:", "").trim();

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.