1

I have a sting which happens to be HTML, and I wish to delete specific sections of it serverside using PHP (no JavaScript/jQuery solutions please). The string will need to have certain identifiers in it to tag sections which might wish to be removed, and I will also have some variable which indicates which tagged sections should be removed. These indicator tags should not remain in the final modified string.

For instance, consider $html_1 where I included a capture attribute to tag the sections which might be deleted. Or $html_2 where I wrapped [capture] around the tags which might be deleted. Note that these were just two possible ways I thought of tagging the sections, and am okay with any other method which allows the string to be stored in a DB.

For both, I have a <h2> block, <h1> block, and <p> block where capture is used to indicate sections which may or may not be removed. Then given $modify which indicates which sections should or shouldn't be removed, how can I generate the new string which is equal to $html_new? I am thinking maybe a DOMDocument, str_replace, or regex solution might work, but not sure.

<?php

$html_1 = <<<EOT
<div>
    <div>
        <div>
            <h1 capture="a">bla bla bla</h1>
            <p>bla</p>
            <h2 capture="b">bla bla<span>bla</span></h2>
            <h1>bla bla bla bla</h1>
        </div>
    </div>
    <div>
        <p capture="c">bla bla bla</p>
        <h1>bla bla</h1>
    </div>
</div>
EOT;

$html_2 = <<<EOT
<div>
    <div>
        <div>
            [caption id="a"]<h1>bla bla bla</h1>[/caption]
            <p>bla</p>
            [caption id="b"]<h2>bla bla<span>bla</span></h2>[/caption]
            <h1>bla bla bla bla</h1>
        </div>
    </div>
    <div>
        [caption id="c"]<p>bla bla bla</p>[/caption]
        <h1>bla bla</h1>
    </div>
</div>
EOT;

$modify=array('a'=>true,'b'=>false,'c'=>true);

$html_new = <<<EOT
<div>
    <div>
        <div>
            <p>bla</p>
            <h2>bla bla</h2>
            <h1>bla bla bla bla</h1>
        </div>
    </div>
    <div>
        <h1>bla bla</h1>
    </div>
</div>
EOT;
?>
5
  • Have you tried anything yourself? Looks like a pretty simple regex pattern to me. Commented Jun 2, 2015 at 13:44
  • @treegarden I am pretty weak with regex. My difficulty would be differentiating between the a, b, and c tag. I was probably going to go down the DOMdocument solution, but maybe that isn't the right way to go. Commented Jun 2, 2015 at 13:46
  • HTML with regex? See here. DOMdocument is exactly the way to go. Commented Jun 2, 2015 at 13:46
  • @HoboSapiens A little melodramatic, but fun post! I still feel regex works with very defined cases, but am not claiming it should be used for my current need. Thanks! Commented Jun 2, 2015 at 13:53
  • 1
    @HoboSapiens meta.stackoverflow.com/questions/261561 Commented Jun 2, 2015 at 13:54

2 Answers 2

1

I used $html_2, because I felt it's easier. That should do the trick:

foreach($modify as $letter=>$remove) {
    $pattern = '/\[caption id="' . $letter . '"\](.*)\[\/caption\]/U';
    $replace = ($remove) ? '' : '$1';
    $html_2 = preg_replace($pattern, $replace, $html_2);
}
$html_2 = preg_replace('/^\h*\v+/m', '', $html_2); // Optional: Removing empty lines

In case $remove is false for a certain letter, the matched part of the string get's replaced with the first capture group (which is everything surrounded by the capture tags). If it's true, it get's replaced with an empty string.

Sign up to request clarification or add additional context in comments.

3 Comments

Given the very unique deliminator [caption..., I would expect that this won't corrupt the HTML. Agree?
Well, all [caption...] tags will get removed on server-side before it's send to the client and the HTML is rendered, so you don't need to worry about that :)
And no, given the tags' uniqueness you also don't need to worry about the regex messing up anything else.
0

You could use preg_replace to replace any line containing capture="a" with a blank line, like this:

$stripped = preg_replace(/^.*(capture="a").*$/, '', $html_1);

If you encased this in a function, you could pass an argument to strip out a, b, or c:

function strip($capture,$block){
    $stripped = preg_replace(/^.*(capture="'.$capture.'").*$/, '', $block);
    return $stripped;
}

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.