1

I am working on a crawl script to read ratings from a webshop.

The curl part is now making me problems, since it does not retrieve the right content.

I select the urls from the database in batches with an ajax script. I give the curl the correct url to the page with the ratings, but Curl is retrieving the page without the variable part in the url.

This is the url I am passing onto Curl: $actualurl

http://www.domain.com/epages/xxx.sf/de_DE/?ObjectPath=/Shops/15456062/Products/%22Briefkastenst%C3%A4nder%20Bobiround%22/SubProducts/%22Briefkastenst%C3%A4nder%20Bobiround%20gr%C3%BCn%20RAL6005%22&ViewAction=ViewProductRating

(This is the page I want to read all 6 ratings (Produktbewertungen) from.)

But with the curl call I get contents from this page, which is the same without the viewAction, I echoed the output

http://www.domain.com/epages/xxx.sf/de_DE/?ObjectPath=/Shops/15456062/Products/%22Briefkastenst%C3%A4nder%20Bobiround%22/SubProducts/%22Briefkastenst%C3%A4nder%20Bobiround%20gr%C3%BCn%20RAL6005%22

My Curl call looks like this:

            $ch = curl_init();
            curl_setopt($ch, CURLOPT_TIMEOUT, 30);
            curl_setopt($ch, CURLOPT_USERAGENT, $agent);
            curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
            curl_setopt($ch, CURLOPT_URL, $actualurl);              
            //read content of $url
            $result = curl_exec ($ch);
            curl_close ($ch);

Why is Curl ignoring the last part of the url (with &ViewAction=ViewProductRating)

Thank you so much I am still new to curl!

EDIT

I build the url mentioned above from 4 parts. The parts are following:

$domainroot: http://www.domain.com/
$objectpath: epages/xxx.sf/de_DE/?ObjectPath
$ratingurl: %3D%2FShops%2F15456062%2FProducts%2F%2522Briefkastenst%25C3%25A4nder%2520Bobiround%2522%2FSubProducts%2F%2522Briefkastenst%25C3%25A4nder%2520Bobiround%2520gr%25C3%25BCn%2520RAL6005%2522%26amp%3B
$viewratings: ViewAction=ViewProductRating

And at last I chain them together:

$actualurl = $domainroot.$objectpath.$ratingurl.$viewratings;
4
  • It looks as if the CURLOPT_POST flag is active, although you didn't set it... Commented Apr 10, 2013 at 6:14
  • echo curl_error($ch);,You wll get "malformed" Commented Apr 10, 2013 at 6:19
  • @shin I tried it, but it outputs nothing. Commented Apr 10, 2013 at 7:28
  • @Kaktus,I tried your code,with echo curl_error($ch);,i got "malformed url",and curl_errno($ch) get error no 3 Commented Apr 10, 2013 at 8:24

2 Answers 2

1

The first parameter of your query string shall be properly encoded:

$queryString = 'ObjectPath=%2FShops%2F15456062%2FProducts%2F%22Briefkastenst' .     
'%C3%A4nder+Bobiround%22%2FSubProducts%2F%22Briefkastenst' . 
'%C3%A4nder+Bobiround+gr%C3%BCn+RAL6005' 
'&ViewAction=ViewProductRating';
Sign up to request clarification or add additional context in comments.

12 Comments

Thank you. I am setting the url from 4 parts (Domain + firstpart + databasepart + the viewActionpart), I tried to make an urlencode on the firstpart and databasepart, but now the URL is not found on the server. What parts do I have to encode? Strange because I thought it is properly encoded
Hm strange I do not understand this answer, I tried to encode the part you said, but it results always as site not found from the server.
What do you call database part and first part from your example? You need to encode (using urlencode after having decoded any already encoded value) any values passed as parameters in the query string. The encoding example provided have been obtained as follow (as the ObjectPath parameter value has been partially encoded): $objectPathValue = urlencode(urldecode('/Shops/15456062/Products/%22Briefkastenst%C3%A4nder%20Bobiround%22/SubProducts/%22Briefkastenst%C3%A4nder%20Bobiround%20gr%C3%BCn%20RAL6005%22'));
Thank you @Thierry, I edited my question with the exact parts of the url!
In the rating URL, the leading = symbol (%3D) should not be present. You may need to remove it and concatenate it (without encoding) at the end of the $objectPath string value.
|
0

Thank you so much for your help! You made my day!

And thanks to anyone who tried to help!

It was indeed the & character who messed up. Somehow the script has the when entering the urls in the database the & to &, which had to be changed back

$ratingurl = str_replace('&','&',$ratingurl);

The url was originally fetched with preg_match_all and was entered directly into database

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.