0

I have the following regex :

 $string = preg_replace("/([\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/])/i","<a target=\"_blank\" href=\"$1\">$1</A>",$string);

Using it to parse this string : http://www.ttt.com.ar/hello_world

Produces this new string :

<a target="_blank" href="http://www.ttt.com.ar/hello_world">http://www.ttt.com.ar/hello_world</A>

So far , soo good. What I want to do is to get replacement $1 to be a substring of $1 producing an output like :

<a target="_blank" href="http://www.ttt.com.ar/hello_world">http://www.ttt.com.ar/...</A>

Pseudocode of what I mean:

 $string = preg_replace("/([\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/])/i","<a target=\"_blank\" href=\"$1\">substring($1,0,24)..</A>",$string);

Is this even possible? Probably Im just doing all wrong :)

Thanks in advance.

2
  • 2
    What is [\w-?&;#~=\.\/\@] (near the beginning) supposed to match? Commented Apr 11, 2014 at 18:56
  • Also might try something like this without callback Commented Apr 11, 2014 at 19:58

3 Answers 3

2

Check out preg_replace_callback():

$string = 'http://www.ttt.com.ar/hello_world';

$string = preg_replace_callback(
    "/([\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/])/i",
    function($matches) {
        $link = $matches[1];
        $substring = substr($link, 0, 24) . '..';
        return "<a target=\"_blank\" href=\"$link\">$substring</a>";
    },
    $string
);

var_dump($string);
// <a target="_blank" href="http://www.ttt.com.ar/hello_world">http://www.ttt.com.ar/...</a>

Note, you can also use the e modifier in PHP to execute functions in your preg_replace(). This has been deprecated in PHP 5.5.0, in favor of preg_replace_callback().

Sign up to request clarification or add additional context in comments.

5 Comments

Please select it as the answer if it helped :)
@Bathan No, this is not what you are looking for. Even if you think this and the regex looks promising. Check my answer. I'm not saying this to promote myself, but you should not parse xml with regexes.
@hek2mgl I have read your answer and you have a valid point but this answers what I needed to do regardless of bad/good practices. Thanks for the help. This is not the final code Im using, it was only to provide an easy example.
@Bathan good so then. But you'll find me saying this again and again, at least for production code ;) Especially for users which are entering this page from google and might think this is good practice
I dont mind, I like chatting :) just make sure you add some exit condition so you dont end up on an infinite loop :)
2

You can use a capturing group inside of a lookahead like this:

preg_replace(
    "/((?=(.{24}))[\w]+:\/\/[\w-?&;#~=\.\/\@]+[\w\/])/i",
    "<a target=\"_blank\" href=\"$1\">$2..</A>",
     $string);

This will capture the entire URL in group 1, but it will also capture the first 24 characters of it in group 2.

Comments

2

You are showing bad practice. Regexes should not being used to parse or modify xml content from application's context.

Suggests:

  • Use a DOM parsing to read and modify the value
  • use parse_url() to get the protocol + domain name

Example:

$doc = new DOMDocument();
$doc->loadHTML(
    '<a target="_blank" href="http://www.ttt.com.ar/hello_world">http://www.ttt.com.ar/hello_world</A>'#
);

$link = $doc->getElementsByTagName('a')->item(0);
$url = parse_url($link->nodeValue);

$link->nodeValue = $url['scheme'] . '://' . $url['host'] . '/...';

echo $doc->saveHTML();

2 Comments

Probably could be better off as a comment, imo.
@Sam I've added an example

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.