scraping using PHP Simple HTML DOM Parser

Question

I want to use PHP simple HTML DOM parser to scrape from a website. Source code is so random like that :

      <font face="Arial" color="#ff0000">
      <p>Parameters</p>
      </font><font face="Arial" size="2" color="#ff0000">
      <p>Param1</p>
      </font><font face="Arial" size="2" color="#0000ff">
      <p>Details. (Lob., </font><i><font face="Arial"
      size="2" color="#ff0000">Co v</font><font face="Arial" size="2"
      color="#0000ff">.)</p>

Instead of putting directly "Details. (Lob., Co v.)" inside , it's put using and . When I use this code

foreach($html->find('p') as $p) 
{
  echo $p->plaintext.'<br>';
}

I find "Details. (Lob.," it stops when finding or . How can I extract the whole line "Details. (Lob., Co v.)"

Thank you for your answer

Do you mean "scrape"? Just making sure.

Don't Panic
– Don't Panic

2017-01-23 21:13:33 +00:00
Commented Jan 23, 2017 at 21:13 — Don't Panic
– Don't Panic, Commented Jan 23, 2017 at 21:13
Yes sorry, I mean scrape

balimaco00
– balimaco00

2017-01-23 21:54:09 +00:00
Commented Jan 23, 2017 at 21:54 — balimaco00
– balimaco00, Commented Jan 23, 2017 at 21:54

Ishtiyaq Husain · Accepted Answer · 2017-01-23 21:25:19Z

You can use strip_tags() function to remove the unnecessary tags. after removing unnecessary tags, you can use DOM parser.

The strip_tags() function strips a string from HTML, XML, and PHP tags.

string strip_tags ( string $str [, string $allowable_tags ] )

You can read more about strip_tags() function on php.net

Example:

$html = '<font face="Arial" color="#ff0000">
    <p>Parameters</p>
    </font><font face="Arial" size="2" color="#ff0000">
    <p>Param1</p>
    </font><font face="Arial" size="2" color="#0000ff">
    <p>Details. (Lob., </font><i><font face="Arial"
    size="2" color="#ff0000">Co v</font><font face="Arial" size="2"
    color="#0000ff">.)</p>';

$html = strip_tags($string, '<p>');
echo $html;

Result:

  <p>Parameters</p>

  <p>Param1</p>

  <p>Details. (Lob., Co v.)</p>

Collectives™ on Stack Overflow

scraping using PHP Simple HTML DOM Parser

1 Answer 1

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Related