32

I have problem with matching the html attributes (in a various html tags) with regex. To do so, I use the pattern:

myAttr=\"([^']*)\"

HTML snippet:

<img alt="" src="1-p2.jpg" myAttr="http://example.com" class="alignleft" />

it selects text from the myAttr the end /> but I need to select the myAttr="..." ("http://example.com")

0

5 Answers 5

41

You have an apostrophe (') inside your character class but you wanted a quote (").

myAttr=\"([^"]*)\"

That said, you really shouldn't be parsing HTML with regexes. (Sorry to link to that answer again. There are other answers to that question that are more of the "if you know what you are doing..." variety. But it is good to be aware of.)

Note that even if you limit your regexing to just attributes you have a lot to consider:

  • Be careful not to match inside of comments.
  • Be careful not to match inside of CDATA sections.
  • What if attributes are bracketed with single quotes instead of double quotes?
  • What if attributes have no quotes at all?

This is why pre-built, serious parsers are generally called for.

Sign up to request clarification or add additional context in comments.

2 Comments

if the attribute value contain \" ,this regex will be wrong such as <a href="dddd" c="\"abc\""/>
Nice, another good reason not to use regexes here! It is completely the wrong approach.
10

The * is a greedy quantifier. You should follow it with a question mark to make it non-greedy:

myAttr=\"([^']*?)\"

Comments

3

If you only want the myAttr parameter value, use this:

"myAttr=\"([^\"]+)\""

Comments

2

you can try use that

 myAttr=\"?[\w:\-]+ ?= ?("[^"]+"|'[^']+'|\w+)\"

Comments

-5

<[^>]*>

Just try this is this help for remove all tag

Example Something

1 Comment

Did you read the question?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.