2

I'm trying to take a rather large list of domains query the rank of each using the compete.com API as seen here -> https://www.compete.com/developer/documentation

The script I wrote takes a database of domains I populated and initiates a cURL request to compete for the rank of the website. I quickly realized that this was very slow because each request was being sent one at a time. I did some searching and came across this post-> http://www.phpied.com/simultaneuos-http-requests-in-php-with-curl/ which explains how to perform simultaneous HTTP requests in PHP with cURL.

Unfortunately that script will take an array of 25,000 domains and try to process them all at once. I found that batches of 1,000 work quite well.

Any idea how to send 1,000 queries to compete.com then wait for completion and send the next 1,000 until the array is empty? Here's what I'm workin with thus far:

<?php

//includes
include('includes/mysql.php');
include('includes/config.php');

//get domains
$result = mysql_query("SELECT * FROM $tableName");
while($row = mysql_fetch_array($result)) {
    $competeRequests[] = "http://apps.compete.com/sites/" . $row['Domain'] . "/trended/rank/?apikey=xxx&start_date=201207&end_date=201208&jsonp=";
}

//first batch
$curlRequest = multiRequest($competeRequests);
$j = 0;
foreach ($curlRequest as $json){
    $j++;
    $json_output = json_decode($json, TRUE);
    $rank = $json_output[data][trends][rank][0][value];

    if($rank) {
        //Create mysql query
        $query = "Update $tableName SET Rank = '$rank' WHERE ID  = '$j'";

        //Execute the query
        mysql_query($query);
        echo $query . "<br/>";
    }
}


function multiRequest($data) {
  // array of curl handles
  $curly = array();
  // data to be returned
  $result = array();

  // multi handle
  $mh = curl_multi_init();

  // loop through $data and create curl handles
  // then add them to the multi-handle
  foreach ($data as $id => $d) {

    $curly[$id] = curl_init();

    $url = (is_array($d) && !empty($d['url'])) ? $d['url'] : $d;
    curl_setopt($curly[$id], CURLOPT_URL,            $url);
    curl_setopt($curly[$id], CURLOPT_HEADER,         0);
    curl_setopt($curly[$id], CURLOPT_RETURNTRANSFER, 1);

    // post?
    if (is_array($d)) {
      if (!empty($d['post'])) {
        curl_setopt($curly[$id], CURLOPT_POST,       1);
        curl_setopt($curly[$id], CURLOPT_POSTFIELDS, $d['post']);
      }
    }

    curl_multi_add_handle($mh, $curly[$id]);
  }

  // execute the handles
  $running = null;
  do {
    curl_multi_exec($mh, $running);
  } while($running > 0);

  // get content and remove handles
  foreach($curly as $id => $c) {
    $result[$id] = curl_multi_getcontent($c);
    curl_multi_remove_handle($mh, $c);
  }

  // all done
  curl_multi_close($mh);

  return $result;

}
?>
2
  • I didn't see a question in there. Commented Sep 12, 2012 at 0:43
  • Looking for a way to run 1000 domain batch rank check for 25k domains Commented Sep 12, 2012 at 0:45

2 Answers 2

5

Instead of

//first batch
$curlRequest = multiRequest($competeRequests);

$j = 0;
foreach ($curlRequest as $json){

You can do:

$curlRequest = array();

foreach (array_chunk($competeRequests, 1000) as $requests) {
    $results = multiRequest($requests);

    $curlRequest = array_merge($curlRequest, $results);
}

$j = 0;
foreach ($curlRequest as $json){
    $j++;
    // ...

This will split the large array into chunks of 1,000 and pass those 1,000 values to your multiRequest function which uses cURL to execute those requets.

Sign up to request clarification or add additional context in comments.

1 Comment

Is there a way to execute 1,000 then echo results then continue? As of now the script takes several minutes to run and it just shows a loading screen the entire time. Any ideas for progress bar / status updates?
0

https://github.com/webdevelopers-eu/ShadowHostCloak

This does exactly what you want. Just pass empty argument to new Proxy() to bypass proxy and make direct requests.

You can stuff 1000 requests in it and call $proxy->execWait() and it will process all requests simultaneously and exit that method when everything is done... Then you can repeat.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.