0

A PHP regex / PHP DOM / PHP XPath question.

Given the following HTML with inline CSS:

<p style='text-indent: 22px; font-weight: bold; line-height: 1em; color: #FFF'>

How do I remove the 'line-height' and 'color' CSS properties, and leave text-indent and font-weight untouched, so the resultant HTML is:

<p style='text-indent: 22px; font-weight: bold;'>

The HTML file could be potentially hundreds of lines, with various nesting of tags and other attributes applied to any tag.

Note that the 'style' attribute may be applied to other tags than <p>

I am aware there are approaches using both PHP DOM and regex - my current thinking was using something along these lines:

$elements = $xPath->query('//*[@style="color"]');
foreach ($elements as $element) {   
  //remove style='color'
}

Many thanks

EDIT

Here's my solution:

https://github.com/sabberworm/PHP-CSS-Parser

To create:

$dom = new DOMDocument;
@$dom->loadHTML('<?xml encoding="UTF-8">' . $html);
$xPath = new DOMXPath($dom);
$elements = $xPath->query('//p|//span');
foreach($elements as $element){
    $oParser = new CSSParser("p{" . $element->getAttribute('style') . "}");
    $oCss = $oParser->parse();
    foreach($oCss->getAllRuleSets() as $oRuleSet) {
        $oRuleSet->removeRule('line-');
        $oRuleSet->removeRule('margin-');
        $oRuleSet->removeRule('font-');
    }
    $css = $oCss->__toString();
    $css = substr_replace($css, '', 0, 3);
    $css = substr_replace($css, '', -1, 1);
    $element->setAttribute('style', $css);
}
$src = $dom->saveHTML();

1 Answer 1

3

Definitely use proper HTML and CSS parsers rather than regexes. For the XPath query, use the contains function to find the nodes to alter:

//*[contains(@style, 'color:')]

Then use a CSS parser to remove the properties you don't want.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.