0

I am trying to download a files from API via Python requests. I cannot emulate a browser despite using the headers. I tried using requests.get(link, headers=headers) and it did not work. I also don't understand why session.get(link).headers lacks the 'user-agent'. I always get "The requested URL was rejected." even though the link works in a browser.

import requests # requests 2.32.3
# File from the returned resources api.I am also interested in downloading XLSX files.
link = "https://open.data.gov.sa/odp-public/61ebb13c-d3b3-4cee-8edd-0f0d71923d9a/d79e9bfc-ae6b-4504-84cb-e6010d88aebd/v1/Tabuk University graduates intermediate diploma.csv"
#Obtained from https://httpbin.org/headers
headers = {
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", 
        "Accept-Encoding": "gzip, deflate, br", 
        "Accept-Language": "sv-SE,sv;q=0.9", 
        "Host": "httpbin.org", 
        "Sec-Fetch-Dest": "document", 
        "Sec-Fetch-Mode": "navigate", 
        "Sec-Fetch-Site": "none", 
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.6 Safari/605.1.15", 
        "X-Amzn-Trace-Id": "Root=1-68836664-4337d8c5727c410265c779f0"
    }


session = requests.Session()
session.headers.update(headers)
print(session.get(link).text)
2
  • There seems to be recommended python code to interact with API data right in the website. And I do not see any modification needed in header or any other parameter. Only difference being you need to hit https://open.data.gov.sa/data/api/datasets?version=-1&dataset=d79e9bfc-ae6b-4504-84cb-e6010d88aebd Commented Jul 25 at 20:39
  • Thanks, the api you showed I used to get this link (the actual file to download). it is under 'resources' 'downloadUrl'. I am now aiming to download the file using GET. Commented Jul 26 at 10:29

1 Answer 1

0

You can't truly emulate a modern browser using requests, and you shouldn't try unless your target is completely static or you’re doing low-level HTTP probing

You should explore with playwright,httpx,requests,headless chrome.

Sign up to request clarification or add additional context in comments.

1 Comment

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.