2

I'm trying to replace an existing url in given text with a new url using regex. I don't seem to get any matches for the regex pattern I'm using:

string regex = "<a href=\"http://domain/page.asp?id=(\\d+)&amp;oid=(\\d+)&amp;type=(\\w+)\">";

Can someone please assist me write a correct pattern to find urls that look like:

"<A href=\"http://domain/page.asp?id=38957&amp;oid=2497&amp;type=JPG\">"

Below is my test code which cannot find any matches for the pattern I'm using:

string result = string.Empty;

string sampleText = "<A href=\"http://domain/page.asp?id=38957&amp;oid=2497&amp;type=JPG\"><U>Click here for Terms &amp; Conditions...</U></A>";

string regex = "<a href=\"http://domain/page.asp?id=(\\d+)&amp;oid=(\\d+)&amp;type=(\\w+)\">";
        Regex regEx = new Regex(regex, RegexOptions.IgnoreCase);

result= regEx.Replace(text, "<a href=\"/newPage/Index/$1&opid=$2)\">");
6
  • "can't get it working" is not very descriptive. What isn't working? Errors? Exceptions? Please post what you expect to happen vs what is happening. Commented Sep 6, 2012 at 10:01
  • 1
    Very difficult to choose an exact duplicate on the 'related' list. However, don't use Regex, use Html Agility Pack Commented Sep 6, 2012 at 10:03
  • What @Steve said, in conjunction with the Uri class to parse the URLs once you get them using the HAP. Commented Sep 6, 2012 at 10:05
  • possible duplicate of C# Regex replace url Commented Sep 6, 2012 at 10:07
  • Just to point you in the right direction if you still want to go down the Regex route: Look into escape characters. e.g. ? makes the previous character optional. so ...page.asp?id matches ...page.asid or page.aspid but NOT page.asp?id Commented Sep 6, 2012 at 11:48

1 Answer 1

1

Everything looks fine except that . and ? are special characters in regular expressions, so they need to be escaped to be treated as literals. So your expression:

string regex = "<a href=\"http://domain/page.asp?id=(\\d+)&amp;oid=(\\d+)&amp;type=(\\w+)\">";

Needs to be:

string regex = "<a href=\"http://domain/page\\.asp\\?id=(\\d+)&amp;oid=(\\d+)&amp;type=(\\w+)\">";

Note the backslashes in front of the . and ?.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.