1

I have problem with parsing HTML with DOM in PHP. I want to retrieve href value but giving me error. I want row value and href value together in a two dimensional array. The last line in code also give error too. Any Idea ? The output I want is --
1,"http://.....",User
2,"http://..... ",Server ...etc in 2D array.

<html>
<body>
    <table>
        <tbody>
            <tr>
                <td>1 </td>
                <td><a href="http://www.abcd.net"></a></td>
                <td>User</td>
            </tr>
            <tr>
                <td>2 </td>
                <td><a href="http://www.def.net"></a></td>
                <td>Server</td>
            </tr>
        </tbody>
    </table>
  </body>
   </html> 

Here is PHP Code

$resArr = array();

$dom = new domDocument;
@$dom -> loadHTML(file_get_contents($link));
$dom -> preserveWhiteSpace = false;

$linkt = $dom -> getElementsByTagName('table');
$linkt1 = $linkt -> item(2);

//tr
foreach ($linkt1 -> childNodes as $key => $tag){
    //td
    foreach ($tag -> childNodes as $key1 => $tag1){

        foreach ($tag1 -> childNodes as $key2 => $tag2){
             echo $tag2->hasattribute('href');
                      //Error Occur here ----Fatal error: Call to 
                      //undefined method DOMText::hasattribute() in on line 38
        }
    }
}

$resArr[$i][0] = $tag -> childNodes -> item(0) -> nodeValue;
$resArr[$i][3] = $tag -> childNodes -> item(3) -> nodeValue;
$resArr[$i][1] = $tag1 -> childNodes -> item(1) -> 
  childNodes -> item(0) -> getAttribute('href'); //the same error as above
4
  • 8
    If you're getting an error, include the error message in your question. Commented Mar 3, 2012 at 4:48
  • 3
    Your expected output would be helpful too. We can't read your mind. Commented Mar 3, 2012 at 5:11
  • Do you have control of the HTML? Why not fix it at souce and therefore get better performance? Commented Mar 3, 2012 at 6:27
  • Ed Heal, I want to retrieve data from other site and implement in my database. I don't have control on it. Commented Mar 3, 2012 at 7:40

1 Answer 1

3

I don't know exactly what output you want, but I'm pretty sure this is an XPath problem. Something like this?

// Your sample html is stored in $html as a string
libxml_use_internal_errors(false);
$dom = new DOMDocument();
$dom->loadHTML($html);
libxml_use_internal_errors(true);

$xp = new DOMXPath($dom);

$rows = $xp->query('/html/body/table/tbody/tr');

$resArr = array();
foreach ($rows as $row) {
    $resArr[] = array(
        $xp->evaluate('string(td[1])', $row),
        $xp->evaluate('string(td[2]/a/@href)', $row),
        $xp->evaluate('string(td[3])', $row),
    );
}

var_dump($resArr);

The output from this code:

array(2) {
  [0]=>
  array(3) {
    [0]=>
    string(2) "1 "
    [1]=>
    string(19) "http://www.abcd.net"
    [2]=>
    string(4) "User"
  }
  [1]=>
  array(3) {
    [0]=>
    string(2) "2 "
    [1]=>
    string(18) "http://www.def.net"
    [2]=>
    string(6) "Server"
  }
}
Sign up to request clarification or add additional context in comments.

4 Comments

Works fine for me. See updated answer. Are you getting any errors?
Glad it works. Consider accepting the answer if it answers your question.
Hi Francis, XPath is good and easy. But if the HTML is too complicated, it is really hard to look at the path. Is there alternative method like I use in my example php code?
There is no way that any DOM method will be simpler than XPath. You also don't need to enumerate full paths. Use // to abbreviate.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.