13

Possible Duplicate:
How to parse and process HTML with PHP?

I'm pretty new to PHP. I have the text of a body tag of some page in a string variable. I'd like to know if it contains some tag ... where the tag name tag1 is given, and if so, take only that tag from the string. How can I do that simply in PHP?

Thanks!!

1
  • There are several ways to approach that and they all depend on what exactly you want to do. Do you need to parse HTML? Use an HTML parser. Do you want to strip tags and the syntax is known to be within certain limits? Use strip_tags(). Do you want to blacklist certain tags in arbitrary HTML? Maybe better think about whitelisting allowed tags instead? Use a library like HTML Purifier. Commented Nov 19, 2012 at 16:36

3 Answers 3

19

You would be looking at something like this:

<?php
$content = "";
$doc = new DOMDocument();
$doc->load("example.html");
$items = $doc->getElementsByTagName('tag1');
if(count($items) > 0) //Only if tag1 items are found 
{
    foreach ($items as $tag1)
    {
         // Do something with $tag1->nodeValue and save your modifications
         $content .= $tag1->nodeValue;
    }
}
else
{
     $content = $doc->saveHTML();
}
echo $content;
?>

DomDocument represents an entire HTML or XML document; serves as the root of the document tree. So you will have a valid markup, and by finding elements By Tag Name you won't find comments.

Sign up to request clarification or add additional context in comments.

Comments

3

Another possibility is regex.

$matches = null;
$returnValue = preg_match_all('#<li.*?>(.*?)</li>#', 'abc', $matches);

$matches[0][x] contains the whole matches such as <li class="small">list entry</li>, $matches[1][x] containt the inner HTML only such as list entry.

Comments

0

Fast way:

Look for the index position of tag1 then look for the index position of /tag1. Then cut the string between those two indexes. Look up strpos and substr on php.net Also this might not work if your string is too long.

$pos1 = strpos($bigString, '<tag1>');
$pos2 = strpos($bigString, '</tag1>');
$resultingString = substr($bigString, -$pos1, $pos2);

You might have to add and/or substract some units from $pos1 and $pos2 to get the $resultingString right. (if you don't have comments with tag1 inside of them sigh)

The right way:

Look up html parsers

10 Comments

And how do you look up the positions of the tags? Keep in mind that this is valid HTML: <!-- <tag> --><foo bar="</tag>">...
$pos1 = strpos($bigString, '<tag1>'); Doesn't matter, you treat it as a string.
Ooops, you just found "<tag1>" inside <!-- <tag1> -->, i.e. not really a tag... :)
Or you could, you know, just use a proper HTML parser. :P
Fair enough. Just wondering why you also mention the wrong way. ;)
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.