4

I have a PHP page that I run every minute through a CRON job.

I have been running it for quite some time but suddenly it started throwing up these errors:

Maximum execution time of 30 seconds exceeded in /home2/sharingi/public_html/scrape/functions.php on line 84

The line number will vary with each error, ranging from line 70 up into the 90s.

Here is the code from lines 0-95

function crawl_page( $base_url, $target_url, $userAgent, $links)
{
    $ch = curl_init();

    curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
    curl_setopt($ch, CURLOPT_URL,$target_url);
    curl_setopt($ch, CURLOPT_FAILONERROR, false);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_AUTOREFERER, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
    curl_setopt($ch, CURLOPT_TIMEOUT, 100);
    curl_setopt($ch, CURLOPT_MAXREDIRS, 10); //follow up to 10 redirections - avoids loops

    $html = curl_exec($ch);

    if (!$html) 
    {
        echo "<br />cURL error number:" .curl_errno($ch);
        echo "<br />cURL error:" . curl_error($ch);
        //exit;
    }

    //
    // load scrapped data into the DOM
    //

    $dom = new DOMDocument();
    @$dom->loadHTML($html);

    //
    // get only LINKS from the DOM with XPath
    //

    $xpath = new DOMXPath($dom);
    $hrefs = $xpath->evaluate("/html/body//a");

    //
    // go through all the links and store to db or whatever
    //  

    for ($i = 0; $i < $hrefs->length; $i++) 
    {
        $href = $hrefs->item($i);
        $url = $href->getAttribute('href');

        //if the $url does not contain the web site base address: http://www.thesite.com/ then add it onto the front

        $clean_link = clean_url( $base_url, $url, $target_url);
        $clean_link = str_replace( "http://" , "" , $clean_link);
        $clean_link = str_replace( "//" , "/" , $clean_link);

        $links[] = $clean_link;

        //removes empty array values

        foreach($links as $key => $value) 
        { 
            if($value == "") 
            { 
                unset($links[$key]); 
            } 
        } 
        $links = array_values($links); 

        //removes javascript lines

        foreach ($links as $key => $value)
        {
            if ( strpos( $value , "javascript:") !== FALSE )
            {
                unset($links[$key]);
            }
        }
        $links = array_values($links);

        // removes @ lines (email)

        foreach ($links as $key => $value)
        {
            if ( strpos( $value , "@") !== FALSE || strpos( $value, 'mailto:') !== FALSE)
            {
                unset($links[$key]);
            }
        }
        $links = array_values($links);
    }   

    return $links; 
}

What is causing these errors, and how can I prevent them?

2 Answers 2

5

You should set the max_execution time using the set_time_limit function. If you want infinite time (most likely your case), use:

set_time_limit(0);
Sign up to request clarification or add additional context in comments.

7 Comments

the time limit can also be modified in the php.ini file if running in safe mode.
I cant modify the max_execution_time higher than 30 seconds in my php.ini does set_time_limit(0); over ride this?
if you are not in safe mode: yes.
If this is tied to a cron job that runs every minute and I set the max execute to infinite couldn't that cause some problems?
if you script finish properly not, for example if you script run 10 minute you would have 10 instance around all the time is that an issue ?
|
0

Cause : Some of the functions takes more than 30 seconds to complete.
Solution : Increase the maximum execution time (max_execution_time) in the php configuration file.
1. If you have access to your global php.ini file (usually at /web/conf else you can get the location from Configuration File (php.ini) Path in phpinfo), change max_execution_time=30 to max_execution_time=300.
2. If you have access only to your local php.ini file (you can get the location from Loaded Configuration File in phpinfo), change max_execution_time=30 to max_execution_time=300. Note : this file is named php5.ini for php 5.x+ and php.ini for 4.x.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.