0

I'm parsing internet newspapers's columinst page. I have problem about this site

http://www.sozcu.com.tr/kategori/yazarlar/

the parsing was working fine in the starting but it stopped working.

Here's my code

$curl_handle=curl_init();
curl_setopt($curl_handle, CURLOPT_URL,$gazeteAdress);
//curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 2);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl_handle, CURLOPT_USERAGENT, 'mozilla');
$query = curl_exec($curl_handle);
curl_close($curl_handle);
$html = new simple_html_dom();
$html->load($query);

I don't know why my code sometimes is not parsing the site, so I was thinking about connection_timeout. But It is not the problem, so I was thinking of printing html page with curl instead.

echo $html;

Here is result. (sometimes my code is not parsing html page properly) enter image description here

why the html tags are not coming and why am seeing the result like this. Can anyone help ?

2 Answers 2

1

The content is returned compressed so you should specify Accept-Encoding with 'gzip,deflate' header for curl.

Please add this line
curl_setopt($curl_handle, CURLOPT_ENCODING, "gzip,deflate");
after this
curl_setopt($curl_handle, CURLOPT_USERAGENT, 'mozilla');

Sign up to request clarification or add additional context in comments.

Comments

0

Add this on top of your php script

header('Content-Type: text/html; charset=utf-8');

3 Comments

have this: <meta http-equiv="Content-Type" content="text/HTML; charset=utf-8" />
You don't need this, only <?php header('Content-Type: text/html; charset=utf-8');
mate sometimes it is work but sometimes it is not.. It is not the Turkish character problem

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.