2

I currently have the following:

from selenium import webdriver
d = webdriver.Chrome()
# request the url and get the page contents
title = result.find("span", {"class": "episode"}).find("a").text

However, the 'text' that is returned to me is:

# Note the truncation on the word "envol"
<td class="title"><a href="/title/tt1844708/">La grande envol</a></td>

However, when I download the page source, it shows the following:

<td class="title"><a href="/title/tt1844708/">La grande envolée</a>
    <span class="year_type">(1927)</span><br />
</td>

Why is the text truncated in the webdriver response? How would I ensure it gives me the full utf-8 encoded text?

0

1 Answer 1

1

As far as I understand, you are passing the page_source contents to BeautifulSoup for further parsing.

I would not do that since selenium itself can handle the parsing part pretty well. For example, you can use CSS selectors:

driver.find_element_by_css_selector('span.episode a').text

Example (using this IMDb page):

>>> from selenium import webdriver
>>> driver = webdriver.Chrome()
>>> driver.get('http://www.imdb.com/title/tt1844708/')
>>> print(driver.find_element_by_xpath('//span[@itemprop="name"]').text)
La grande envolée
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.