6

Possible Duplicates:
Does IMDB provide an API?
How to send a header using a HTTP request through a curl call?

I am using PHP curl to scrape movie details from IMDB. It works perfectly in fetching data but the problem i am facing right now is:

When I fetch non English movies like this movie.

When I open this movie in my browser then it shows me "IMDB English"-version page of this movie which shows movie name "Boarding School". But when i fetch the data through curl then it fetch the original page for this movie where the movie name is "Leidenschaftliche Blümchen".

So please suggest me how to fetch the curl data in English version IMDB page.

4
  • have you tried passing a valid user agent with region information? the option is -A in curl Commented Aug 10, 2011 at 10:14
  • 1
    From the IMDB ToS: Robots and Screen Scraping: You may not use data mining, robots, screen scraping, or similar data gathering and extraction tools on this site, except with our express written consent as noted below. Commented Aug 10, 2011 at 10:20
  • 1
    Or, you could actually just skip trying to parse their unstructured data and download their structured data. imdb.com/interfaces Commented Aug 10, 2011 at 10:21
  • bcoz i already saw somewherelse they get the result as usual what i want.. Commented Aug 10, 2011 at 10:22

1 Answer 1

3

When you request a page with a Browser, the Browser sends specific request headers to the server. A firefox extension like firebug can show these (check Net), these are exemplary the headers I just send over to the server with firefox:

GET /title/tt0076306/ HTTP/1.1
Host: www.imdb.com
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20100101 Firefox/5.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.8,de-de;q=0.5,de;q=0.3
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Connection: keep-alive
...

The one that makes a difference possibly:

Accept-Language: en-us,en;q=0.8,de-de;q=0.5,de;q=0.3

See 14.4 Accept-Language.

When you use curl, it will send specific request headers as well but they might differ. However you can command curl to use the headers you specifiy, too.

You just need to make curl use the headers your browser uses and you should get the same result. See How to send a header using a HTTP request through a curl call?.

For getting the german version of the page for example:

curl -H "Accept-Language: de-de;q=0.8,de;q=0.5" http://www.imdb.com/title/tt0076306/

For the english version:

curl -H "Accept-Language: en-us,en;q=0.8,de-de;q=0.5,de;q=0.3" http://www.imdb.com/title/tt0076306/
Sign up to request clarification or add additional context in comments.

7 Comments

could you please tell me what will be the proper header for this..As i already tried this but not getting..
@pravat231: I extended the answer, made a suggestion and linked the specification of the header in question.
Ya, I was trying to the same thing. The other suspect was Javascript. Did you "actually" try sending the same headers and check what the response is?
Yes i already tried but i saw in another website they get the pproper result.
@Gaurav Gupta: Added a curl calling example that does this for me, both german and english.
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.