0

I am using selenium webdriver (firefox) to crawl some data from a website. I just found that opening the web page is slower than just opening the source of that web page. In other words, it took much longer to go to 'www.google.com' than to go to 'view-source:www.google.com'

So I was wondering whether I can use webdriver to get all text from a source page, rather than a normal page.

I tried using driver.page_source for the source page but it returned some mess that I don't want.

1 Answer 1

1

If you only need the source use requests. Install it with pip:

pip install requests

And use it like so:

import requests

r = requests.get("http://google.com/")
# r.content, r.text, r.json(), r.status can be used

For advanced usage refer to the documentation above.

Note: If you need to parse the html use BeautifulSoup and pass it r.content.

Sign up to request clarification or add additional context in comments.

3 Comments

Yes, but I have to use web driver because I need to manually pass the rechaptcha check.
This should provide you with options to get the source code. Also, to optimize load speeds you could disable images like here.
@user3182260 In order to pass the captcha check, you'll probably need to render the page, not just download the source. You might try PhantomJS instead of Selenium + browser. Or, it might render faster in another browser.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.