3

I have to process about 750 xml files to generate a rapport. I probably should've went with XSLT or using XPath, but it's probably too late for that. So my question; for the first couple of records it all works fine. There seem to be a couple of XML files without the nodes I'm calling upon. I've tried using isset and !== null, which doesn't work and just gives me the same error. Namely

Notice: Trying to get property of non-object in /var/www/overzicht/script.php on line 38
Notice: Trying to get property of non-object in /var/www/overzicht/script.php on line 38
Fatal error: Call to a member function children() on a non-object in /var/www/overzicht/script.php on line 38

Using the following is probably wrong, right?

 if($xml_entry->children('http://www.isotc211.org/2005/gmd')->identificationInfo->MD_DataIdentification->citation->CI_Citation->title->children('http://www.isotc211.org/2005/gco'))

A small sample of the XML file I'm trying to parse is (the whole xml can be found here:

 <gmd:contact>
    <gmd:CI_ResponsibleParty>
      <gmd:individualName>
        <gco:CharacterString>B. Boers</gco:CharacterString>
      </gmd:individualName>
      <gmd:organisationName>
        <gco:CharacterString>Staatsbosbeheer</gco:CharacterString>
      </gmd:organisationName>
      <gmd:positionName>
        <gco:CharacterString>Contactpersoon</gco:CharacterString>
      </gmd:positionName>
    </gmd:CI_ResponsibleParty>
</gmd:contact>

And my PHP:

<?php
        $xml_url = "http://www.nationaalgeoregister.nl/geonetwork/srv/dut/q?fast=index&from=1&to=10000&geometry=POLYGON((5.5963%2053.3162%2C5.5963%2053.5766%2C6.9612%2053.5766%2C6.9612%2053.3162%2C5.5963%2053.3162))";
        $xml_single_url = "http://www.nationaalgeoregister.nl/geonetwork/srv/dut/xml.metadata.get?uuid=";
        //Load the XML
        $xml = simplexml_load_file($xml_url);
        $xml_array = array();

        //Loop through all the nodes with 'metadata' and put uuid in the array
        foreach($xml->metadata as $metadata) {
                $xml_array[] = $metadata->children('http://www.fao.org/geonetwork')->children()->uuid;
        }       
        echo "<table>"
        ."<tr>"
        ."<td>Title</td>"
        ."<td>Owner</td>"
        ."<td>Purpose</td>"
        ."<td>Tags</td>"
        ."<td>Url</td>"
        ."<td>Url</td>"     
        ."</tr>";

        $i = 0;
        //For every id in the $xml_array 
        foreach($xml_array as $ar)
        {
            //Just a limit for testing purposes
            $i++;
            if($i == 100)
            {
                break;
            }
            //Loads the xml file
            $xml_entry = simplexml_load_file($xml_single_url .$ar);
            echo "<tr>";

            //Title
            echo "<td>"
            .$xml_entry->children('http://www.isotc211.org/2005/gmd')->identificationInfo->MD_DataIdentification->citation->CI_Citation->title->children('http://www.isotc211.org/2005/gco')->CharacterString
            ."</td>";

            //Owner
            echo "<td>" 
            .$xml_entry->children('http://www.isotc211.org/2005/gmd')->contact->CI_ResponsibleParty->organisationName->children('http://www.isotc211.org/2005/gco')->CharacterString
            ."</td>";

            //Purpose
            echo "<td>" 
            .$xml_entry->children('http://www.isotc211.org/2005/gmd')->identificationInfo->MD_DataIdentification->purpose->children('http://www.isotc211.org/2005/gco')->CharacterString
            ."</td>";

            //Tags      
            //Transfer          
            echo "</tr>";
        }       
        echo "</table>";

?>

I tried finding the solution on my own, but can't seem to find it..

4 Answers 4

2

The problem you have is that you have a long chain of -> operators, and the missing element is somewhere in that chain. As soon as you ask for an element that doesn't exist, you get a NULL, and all subsequent -> operators will fail to some degree or other.

Theoretically, if you have no idea which of the elements in the chain is missing (and maybe you do based on the known/allowed structure of the XML?) you'd have to break the chain down into a whole series of intermediate assignments and isset() checks.

Luckily, PHP lets you get away with calls like null->Property with just a Notice, so it's only the ->children() method call which will cause a fatal error. So you could just check before each call to that:

 if( ! isset($xml_entry) { return; }
 $temp = $xml_entry->children('http://www.isotc211.org/2005/gmd')->identificationInfo->MD_DataIdentification->citation->CI_Citation->title;
 if( ! isset($temp) { return; }     
 echo $temp->children('http://www.isotc211.org/2005/gco'))->CharacterString;

However, the error message tells you more than you may have realised:

  1. Notice: Trying to get property of non-object in /var/www/overzicht/script.php on line 38
  2. Notice: Trying to get property of non-object in /var/www/overzicht/script.php on line 38
  3. Fatal error: Call to a member function children() on a non-object in /var/www/overzicht/script.php on line 38

That's two Notices about accessing properties, and one Fatal error about accessing a method. So the line must break down like this...

$xml_entry
    ->children('http://www.isotc211.org/2005/gmd')
    ->identificationInfo
    ->MD_DataIdentification
    // OK to here

    ->citation
    // This part didn't complain, but subsequent ones did; <citation> is the missing element

    ->CI_Citation
    // First Notice
    ->title
    // Second Notice
    ->children('http://www.isotc211.org/2005/gco'))
    // Fatal error - processing aborts here

    ->CharacterString

So what you need to check for is the existence of a <citation>:

$citation = $xml_entry->children('http://www.isotc211.org/2005/gmd')->identificationInfo->MD_DataIdentification->citation;
if ( isset($citation) )
{
    echo $citation->CI_Citation->title->children('http://www.isotc211.org/2005/gco')->CharacterString;
}
Sign up to request clarification or add additional context in comments.

Comments

2

Your parsing code works fine with your sample XML. You can see at codepad.viper-7.com/6oLCEZ and at 3v4l.org/pW7Wu.

If it is the first call to children() that is complaining, then it seems simplexml_load_file has failed. It returns FALSE on failure, so you need to check for that.

if (FALSE === $xml_entry) {
    echo 'could not load file';
}

More info in docs here. Perhaps the URL is wrong, down or not returning valid XML.

Otherwise it seems elements are missing in the actual XML causing the error. You could check for missing elements using property_exists() like this...

$gmd = $xml_entry->children('http://www.isotc211.org/2005/gmd');

if (property_exists($gmd, 'identificationInfo')) {
    $id_info = $gmd->identificationInfo;
}
if (isset($id_info) && property_exists($id_info, 'MD_DataIdentification')) {
    $md_data_id = $id_info->MD_DataIdentification;
}
if (isset($md_data_id) && property_exists($md_data_id, 'citation')) {
    $citation = $md_data_id->citation;
}
if (isset($citation) && property_exists($citation, 'CI_Citation')) {
    $ci_citation = $citation->CI_Citation;
}
if (isset($ci_citation) && property_exists($ci_citation, 'title')) {
    $title = $ci_citation->title;
}
if (isset($title)) {
    $gco = $title->children('http://www.isotc211.org/2005/gco');
}
//Title
echo "<td>";
if (isset($gco) && property_exists($gco, 'CharacterString')) {
    echo $gco->CharacterString;
}
echo "</td>";

See it at 3v4l.org/0DTjI. And that's not to mention handling the possibility of multiple elements with the same name. So, considering all that, it may not be too late to go down the XPath route after all ;-)

$title = $xml_entry->xpath('/gmd:MD_Metadata/gmd:identificationInfo/gmd:MD_DataIdentification/gmd:citation/gmd:CI_Citation/gmd:title/gco:CharacterString');

echo "<td>";
if (isset($title[0])) {
    $title[0];
}
echo "</td>";

3 Comments

Yes, but don't forget that I use it to parse 750 files. Some of them don't have that element I'm guessing.
Great answer, but I would note that using single-line if (without a {} block) is generally considered bad coding style, as it can easily lead to subtle bugs when someone changes the code later.
@IMSoP Thanks for the feedback. Added the braces. +1 for your answer too.
1

the problem with lines like these:

if($xml_entry->children('http://www.isotc211.org/2005/gmd')->identificationInfo->MD_DataIdentification->citation->CI_Citation->title->children('http://www.isotc211.org/2005/gco'))

is that they are too long and too error-prone. Even SimpleXML allows that kind of "easy" access here, in case it does not find the element somewhere there-in, it will return NULL and then you get the warnings and even the fatal errors.

For you use-case it is much better to use an xpath query to do the job. As you need to access multiple properties representing the meta-data, I suggest to first of all wrap this into a class of it's own, exemplary SimpleXMLElementXpathObject, the there-in used PropertyIterator can be found here.

This type allows you to define the meta-data you look for with a SimpleXMLElement and an array that describes the properties by mapping them to xpath queries:

$metaDef = array(
    'title'   => 'gmd:identificationInfo//gmd:CI_Citation/gmd:title/gco:CharacterString',
    'owner'   => 'gmd:contact/gmd:CI_ResponsibleParty/gmd:organisationName/gco:CharacterString',
    'purpose' => 'gmd:identificationInfo/gmd:MD_DataIdentification/gmd:purpose/gco:CharacterString',
);

As you can see, there is one xpath expression per each key. The keys will be turned into properties. This then allows you to do the mappings on the fly, e.g.:

$meta = new SimpleXMLElementXpathObject($xml, $metaDef);
echo $meta->title, "\n";
echo json_encode($meta, JSON_PRETTY_PRINT), "\n";

Output:

Natuur - Ecologische verbindingszones
{
    "title": "Natuur - Ecologische verbindingszones",
    "owner": "provincie Frysl\u00e2n",
    "purpose": "Beleidsnota \"ecologische verbindingszones in Frysl\u00e2n\" vastgesteld door Provinciale Staten op 4 oktober 2006. Opgenomen in het Streekplan 2007"
}

In case the xpath returns no result, NULL is given. That means the properties are optional, you won't see any warnings or even fatal errors. Just to make it clear: This is basically using the xpath method from SimpleXMLElement so you can also run these queries your own.

A more complete example:

$query = new GeoNetwork_Query();
$query
    ->setGeometry('POLYGON((5.5963 53.3162,5.5963 53.5766,6.9612 53.5766,6.9612 53.3162,5.5963 53.3162))')
    ->setLimit(10);

$metaObj = function (GeoNetwork_Resource $resource) {
    $metaDef = array(
        'title'   => 'gmd:identificationInfo//gmd:CI_Citation/gmd:title/gco:CharacterString',
        'owner'   => 'gmd:contact/gmd:CI_ResponsibleParty/gmd:organisationName/gco:CharacterString',
        'purpose' => 'gmd:identificationInfo/gmd:MD_DataIdentification/gmd:purpose/gco:CharacterString',
    );

    return new SimpleXMLElementXpathObject($resource->getIterator(), $metaDef);
};

$resources = new GeoNetwork_UuidIterator($query);
$objects   = new DecoratingIterator($resources, $metaObj);
$table     = new HtmlTableIterator($objects, ['Title', 'Owner', 'Purpose']);

echo "<table>\n";
foreach ($table as $row) {
    echo $row, "\n";
}
echo "</table>\n";

I have limited the output to 10 so that it won't create a too long list (for the query result). You can also limit the $objects by wrapping them in an LimitIterator. Exemplary output from the code above:

<table>
<tr><td>Title</td><td>Owner</td><td>Purpose</td></tr>
<tr><td>Natuur - Ecologische verbindingszones</td><td>provincie Fryslân</td><td>Beleidsnota "ecologische verbindingszones in Fryslân" vastgesteld door Provinciale Staten op 4 oktober 2006. Opgenomen in het Streekplan 2007</td></tr>
<tr><td>CORINE: Veranderingen in landgebruik in Nederland tussen 1986 en 2000.</td><td>Alterra, Wageningen UR</td><td>Het monitoren van landgebruiksveranderingen op Europese schaal volgens een standaard methode.</td></tr>
<tr><td>Viswaterkaart Sportvisserij</td><td>Sportvisserij Nederland</td><td>Elke sportvisser moet exact weten waar die onder welke (bijz.) voorwaarden mag hengelen.</td></tr>
<tr><td>Veiligheidsafstand vuurwerk</td><td>Interprovinciaal Overleg</td><td>Risicokaart</td></tr>
<tr><td>Weggeg convergenties</td><td>Rijkswaterstaat Data en ICT Dienst (RWS DID)</td><td>Ruimtelijke analyses waarbij ligging van infrastructuur van belang is en bereikbaarheidsberekeningen</td></tr>
<tr><td>Beheerkaart Nat Versie januari 2008</td><td>Rijkswaterstaat Data en ICT Dienst (RWS DID)</td><td>De Beheerkaart Nat wordt door de natte districten van Rijkswaterstaat gebruikt ten behoeve van beheer en onderhoud van zijn beheerobjecten van de watersystemenen. Het NIS gebruikt de gegevens om ondermeer de benodigde budgetten te bepalen voor beheer en onderhoud.</td></tr>
<tr><td>Orthofotomozaieken_project</td><td>Rijkswaterstaat Data en ICT Dienst (RWS DID)</td><td>Gebruik als ondergrond</td></tr>
<tr><td>Knelpunten in LAW-routes</td><td>Stichting Wandelnet</td><td>Inventarisatie van knelpunten in LAW-routes voor provincies</td></tr>
<tr><td>Electronische zeekaarten Ned. Cont. Plat usage Harbour</td><td>Dienst der Hydrografie</td><td>Veilige navigatie</td></tr>
<tr><td>Maatregelzone kernenergie</td><td>Interprovinciaal Overleg</td><td>Risicokaart</td></tr>
</table>

In the code above I used classes from here: https://gist.github.com/hakre/94a36e4587214a6e9bc9

3 Comments

Interesting approach. Two caveats/queries: 1) since XPath uses namespace aliases, won't you need to call registerXPathNamespace somewhere? 2) if the properties being fetched are adjacent in the XML (unlike in this case) would this generic approach be somewhat inefficient due to all the XPath expressions starting from the root?
1.) by default the document ones are registered automatically. However you are right, there could be a third constructor parameter to allow to pass namespace prefix declarations - however that was not needed for this example. 2.) that depends. the xpath expressions start not from the root but from the context node you pass in. I also have played with another iterator based on an xpath query allowing to create multiple of such object from a single xml document instead of one in this example - it works very nicely but as it's not needed here it's not part of the code-example.
Thank you, sorry for my late reply. I did end up using a bit of this approach (not everything, since I had no time to readjust everything), thanks for explaining it so clearly!
0

This looks like you should be using XPath as per this link.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.