3

I am looking for a way to remove duplicate lines from a variable:

$x = '<IMGURL>one.jpg</IMGURL>';
$x .= '<IMGURL>two.jpg</IMGURL>';
//remove the following line:
$x .= '<IMGURL>one.jpg</IMGURL>';
$x .= '<IMGURL>third.jpg</IMGURL>';

The output should be:

$x = '<IMGURL>one.jpg</IMGURL><IMGURL>two.jpg</IMGURL><IMGURL>third.jpg</IMGURL>';

Maybe some regex does the trick?

Edit:

Some more info

The source XML:

<?xml version=".0" encoding="utf-8"?>
<SHOP>
  <SHOPITEM>
    <name>BLUE product</name>
    <IMGURL>main_picture.jpg</IMGURL>
    <PRODUCT_VARIANT id="2">
      <name>blue L</name>
      <IMGURL>blue.jpg</IMGURL>
    </PRODUCT_VARIANT>
    <PRODUCT_VARIANT id="3">
      <name>BLUE XL</name>
      <IMGURL>blue.jpg</IMGURL>
    </PRODUCT_VARIANT>
    <PRODUCT_VARIANT id="4">
      <name>BLUE XXL</name>
      <IMGURL>blue.jpg</IMGURL>
    </PRODUCT_VARIANT>
  </SHOPITEM>
</SHOP>

From this I need two unique jpg:

  • main_picture.jpg
  • blue.jpg

The interesting part of the module what is processing the source XML:

foreach($xml->SHOPITEM as $product){
if(isset($product->IMGURL)){$xml_content .= '<IMAGE>'.htmlspecialchars($product->IMGURL).'</IMAGE>'."\n";}

foreach($product->variant as $option){
              if(isset($option->IMGURL)){$xml_content .= '<IMAGE>'.htmlspecialchars($option->IMGURL).'</IMAGE>'."\n";}
                      }
}
6
  • 1
    question is; how are those created in the first place? Commented Mar 10, 2016 at 18:45
  • 1
    it's xml. load it into dom, find the dupes, remove those nodes Commented Mar 10, 2016 at 18:45
  • This was my first idea, with XSLT. But the source XML is too complicated, if it's needed, I can post here a sample. Commented Mar 10, 2016 at 18:46
  • @Adrian yes, it's a good idea. Commented Mar 10, 2016 at 18:48
  • Do you want remove only <IMGURL> tag or relative <PRODUCT_VARIANT> parent? Commented Mar 10, 2016 at 19:03

1 Answer 1

3

This sample code reduce your XML to desired result:

$dom = new DOMDocument();
$dom->formatOutput = True;
libxml_use_internal_errors( 1 );
$dom->loadXML( $x, LIBXML_NOBLANKS );

$xpath = new DOMXPath( $dom );

$nodes = $xpath->query( '//SHOP/SHOPITEM/PRODUCT_VARIANT/IMGURL' );
$found = array();

foreach( $nodes as $key => $node )
{
    if( in_array( $node->nodeValue, $found ) )
    { $node->nodeValue = ''; }
    else
    { $found[] = $node->nodeValue; }
}

$result = $dom->saveXML();

3v4l demo

Basically, simply use an array to retrieve unique values and, after retrieving all <IMGURL> nodes through xpath, with a foreach loop check each node: if they exists in array, you set node value to an empty string, otherwise you add current node value to the array.

Above script analyze only <IMGURL> that have <PRODUCT_VARIANT> as parent node; if you want analyze all <IMGURL> nodes, simply change xpath line in:

$nodes = $xpath->query( '*//IMGURL' );
Sign up to request clarification or add additional context in comments.

1 Comment

Niiiice! Thank you very much.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.