PHP preg_replace HREF

Question

In short, I'm utilizing pre_replace to find style sheets and essentially proxy this experience for viewers on my website, I use the external domain and prepend it to the current href. The style sheet starts like so.

<link rel="stylesheet" type="text/css" href="/assets/css/base.css">

I will take the href and prepend the domain to be

<link rel="stylesheet" type="text/css" href="http://www.website.com/assets/css/base.css">

My issue is, when I encounter a site that does not include HTTP/HTTPS

<link rel="stylesheet" type="text/css" href="//cdn.website.com/assets/css/base.css">

Then my current preg replace would not function and return the stylesheet to the following

<link rel="stylesheet" type="text/css" href="http://www.website.com//cdn.website.com/assets/css/base.css">

Is it possible to create some sort of If then with preg_replace to not manipulate the "//" hrefs and only replace the ones with no absolute base domain?

Current preg_replace being used:

$html = file_get_contents($website_url);
$domain = 'website.com';
$html = preg_replace("/(href|src)\=\"([^(http)])(\/)?/", "$1=\"$domain$2", $html);
echo $html;

simple: don't use regexes. Use a DOM parser and then it's a simple string replace operation once you've got the href attribute's contents. — Marc B
– Marc B, Commented Jun 13, 2014 at 22:22

l'L'l · Accepted Answer · 2014-06-14 00:32:36Z

2

There are if/then/else conditionals in regex, although not really necessary for this to work:

(?!(href|src)=)(\")\/(\\w+.+)(\">)

Code:

$html = file_get_contents($website_url);
$domain = 'http://website.com';
$result = preg_replace("/(?!(href|src)=)(\")\/(\\w+.+)(\">)/u", "$2$domain/$3$4", $html);
echo $result;

Output:

<link rel="stylesheet" type="text/css" href="http://website.com/assets/css/base.css">

Example:

http://regex101.com/r/kU7pF1

answered Jun 14, 2014 at 0:32

l'L'l

47.5k12 gold badges101 silver badges154 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

mario · Accepted Answer · 2014-06-13 23:03:20Z

1

[^(href)] is not a negation. It's still a character class.

You are looking for a (?!...) negative lookahead:

 ~  (href|src) =\" (?!href:)  \/?  ~x

While I dispute the SO meme and overgeneralization of firing up a DOM traversal for each trivia, it should be noted that regex is often only appropriate for normalized and well-known HTML input; not if your task is proxying arbitrary websites.

answered Jun 13, 2014 at 23:03

mario

146k20 gold badges242 silver badges293 bronze badges

Comments

score 0 · Accepted Answer · 2018-07-29 22:48:48Z

0

function alterLinks($html) {

  $ret = '';

  $dom = new DomDocument();
  $dom->loadHTML($html);
  $links = $dom->getElementsByTagName('a');

  foreach ($links as $alink) {
    $href = $alink->getAttribute('href'); 
    $aMungedLink = $this->mungeHref($href);
    $alink->setAttribute("href",$aMungedLink);
  }

  $ret = $dom->saveHTML();
  return $ret;
}

edited Jul 29, 2018 at 22:48

answered Jul 27, 2018 at 20:57

user3925051

3 Comments

Richie Thomas Over a year ago

Welcome to StackOverflow. While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value. Consider editing your answer to add that context.

user3925051 Over a year ago

Some of the comments in this thread involved regular expressions. I recently had a "change hrefs" problem writing a plugin to a dynamic CMS, so I could optionally output staticHTML instead. I tried but failed to get preg_replace and regular expressions to work. The code above is clean and simple. It worked for me. I didn't write the mungeHref($href) function above because my needs were different than yours. That's the easy part anyway.

user3925051 Over a year ago

fwiw I used almost identical codes to rework the "src" attributes for all images in a dynamic HTML page, so it could then be written out as static HTML. But that's a different topic.

Collectives™ on Stack Overflow

PHP preg_replace HREF

3 Answers 3

Comments

Comments

3 Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

3 Comments

Related