1

I'm trying to find a tag from begin to end in xml and replace it with a blank. A sample xml is like this

<lins>
  <lin index="1"> ...<feature>Something</feature>... </lin>
  <lin index="2">...<feature>Something</feature>... </lin>
  <lin index="3">...<feature>Something</feature>....</lin>

  <lin index="1">...<feature>Icom</feature>... </lin>
  <lin index="2">...<feature>Icom</feature>... </lin>
<lins>

I need to remove <lin> to </lin> when ever I see Icom in between

<lin\s(.+?Icom.+?)+</lin> is removing all lin items since it matches the first begin <lin> tag and the last lin end tag. Greatly appreciated if you can suggest a way to do this. Also I can not use xml parsers in my situation.

5
  • Please re-write what do you need to remove. It is not clear. Commented Dec 21, 2011 at 14:21
  • 1
    You do not have the choice to use some xml parser? Commented Dec 21, 2011 at 14:22
  • I'm trying to find a tag from begin to end in xml and replace it with a blank. A sample xml is in the question. In the above example I need to find and remove <lin index="1">... <feature>Icom</feature>... </lin> <lin index="2">...<feature>Icom</feature>... </lin> The rule is when ever I see Icom remove <lin> to </lin> The regex I used is removing all lin tags. <lin (.+?Icom.+?)+</lin> Commented Dec 21, 2011 at 14:23
  • I my situation I can't use XML parsers Commented Dec 21, 2011 at 14:29
  • 4
    And what situation is that, virtually all platforms known to human have XML parsers. Commented Dec 21, 2011 at 14:35

3 Answers 3

4
String result = subject.replaceAll("(?s)<lin\\b(?:(?!</lin).)*Icom(?:(?!</lin).)*</lin>", "");

should do this, unless you have <lin> tags nested into each other (or inside comments/strings).

Explanation:

<lin\b              # Match <lin (but not link or linen)
(?:                 # Match...
 (?!</lin)          # as long as we're not at a closing tag
 .                  # any character
)*                  # any number of times.
Icom                # Match Icom
(?:(?!</lin).)*     # (as above:) Match any character except closing tag
</lin>              # Match closing tag
Sign up to request clarification or add additional context in comments.

3 Comments

it will not match if there's an inner <lin></lin> tag
@Ademiban: That's exactly what I wrote. Also, it will match, but it'll match an incorrect part of the string (which is probably even worse, which is why I wrote that <lin> tags must not be nested).
@user1110005 would be great to accept this answer and put a check next to it.
0

you cant do it with regexp.
For this example:

<tag>
    <tag> something </tag>
</tag>

<tag>
</tag>

If you use "<tag>(.*)</tag>" regexp, your group will be this:

    <tag> something </tag>
</tag>

<tag>

and if you use "<tag>(.*?)</tag>" regexp, your group will be this:

    <tag> something

You should use something like stack to get the ending of started tag.

1 Comment

Your points are valid (about nested strings), however your examples are both wrong.
0

I think you need to add more groups to the regexp.

Add a group for the precondition to start checking for ex (

Then a group for the stuff inbetween, a group for Icom etc.

So off the top of my head my RegEx would look like:

(<lin\ index\=)(\w+Icom\w+)(\<\/lin>)

Note the escaping might be slightly off, but in essence you need more groups and some less eager matchers.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.