I've got a bunch of HTML data that I'm writing to a PDF file using PHP. In the PDF, I want all of the HTML to be stripped and cleaned up. So for instance:
<ul>
<li>First list item</li>
<li>Second list item which is quite a bit longer</li>
<li>List item with apostrophe 's 's</li>
</ul>
Should become:
First list item
Second list item which is quite a bit longer
List item with apostrophe 's 's
However, if I simply use strip_tags(), I get something like this:
First list item

Second list item which is quite a bit
longer

List item with apostrophe ’s ’s
Also note the indentation of the output.
Any tips on how to properly cleanup the HTML to nice, clean strings without messy whitespace and odd characters?
Thanks :)
strip_tags()alone will encode your entities. Are you sure you're not missing a call tohtmlentitiessomewhere?htmlentitiesis responsible for these things
(e.g.), so if you don't want them, you should not use it.