1

I have tried the below regular expression:

final String REG="\\Q[\\E((Bird)|(Animal)): .*\\Q]\\E";
System.out.println(input.replaceAll(REG," "));

to replace all "[Bird:*]" and "[Animal:*]" into empty space.

for example, given input string

[Bird: Peacock] national bird [India], colorful. [Bird: Crow] crow is black [Animal: Cow] cow gives milk

actual output is:

cow gives milk

It matched [Bird: to the last ] of the given string. But, the expected result should be

national bird [India], colorful. crow is black cow gives milk

Can you anyone help on this?

4 Answers 4

3

* quantifier is greedy by default so just like you noticed it will match maximal range of text like from [Bird: to the last ]. You can make it reluctant quantifier by adding ? after it, so try with

final String REG="\\Q[\\E((Bird)|(Animal)): .*?\\Q]\\E";
//                                            ^ - make `*` reluctant

You can also use second (and preferred technique - because of less backtracking) and instead of . which accept any character (except line separators) use zero or more of not-]-character like (which can be written as [^\\]]*) which will give you

final String REG="\\Q[\\E((Bird)|(Animal)): [^\\]]*\\Q]\\E";

BTW it is easier to escape regex metacharacters [ and ] by adding \\ before them. \\Q and \\E are nice if you want to escape large text which could contains many metacharacters. So consider rewriting your regex to something little shorter

final String REG="\\[(Bird|Animal): [^\\]]*\\]";

or even

final String REG="\\[(Bird|Animal): [^\\]]*\\]";

because ] outside of character class is not actually metacharacter.


One more thing: consider removing one of the spaces which surround deleted [...] token. This way you will prevent output from changing from "[xx] foo [xx] bar [xx] baz" to " foo bar baz".

To do so you can also remove every space after your removed [ ] (if such space exists). So just add \\s? at the end of your regex which will give you

(lets hope) final version of regex

final String REG="\\[(Bird|Animal): [^\\]]*]\\s?";
Sign up to request clarification or add additional context in comments.

Comments

1

.* is greedy by default, it eats many chars as much as possible. In order to make * to do a non-greedy match (shortest possible match), you need to add the quantifier ? just after to *.

\\Q[\\E((Bird)|(Animal)): .*?\\Q]\\E

DEMO

Comments

0

use this:

String regex = String s= "[Bird: Peacock] national bird [India], colorful. [Bird: Crow] crow is black [Animal: Cow] cow gives milk";
String regex = "\\[(Bird|Animal): [^\\]]*]";
System.out.println(s.replaceAll(regex, ""));

Comments

0

You should replace the quantifier with a reluctant quantifier in order to match reluctantly to the next square bracket.

Also, you don't need to quote the spare brackets, you can just escape them.

Finally, you can replace the expression with an empty String instead of a space.

For instance:

final String REG = "\\[((Bird)|(Animal)): .*?\\]";
final String input = "[Bird: Peacock] national bird [India], colorful. [Bird: Crow] crow is black [Animal: Cow] cow gives milk";
System.out.println(input.replaceAll(REG, ""));

Output (still not perfect - starts with a white space and has a couple of consecutive whitespaces)

 national bird [India], colorful.  crow is black  cow gives milk

Full sanitation

System.out.println(
    input.replaceAll(REG, " ")
         .replaceAll("\\s+", " ")
         .replaceAll("^\\s", "")
);

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.