Get HTML source of WebElement in Selenium WebDriver using Python

Question

I'm using the Python bindings to run Selenium WebDriver:

from selenium import webdriver
wd = webdriver.Firefox()

I know I can grab a webelement like so:

elem = wd.find_element_by_css_selector('#my-id')

And I know I can get the full page source with...

wd.page_source

But is there a way to get the "element source"?

elem.source   # <-- returns the HTML as a string

The Selenium WebDriver documentation for Python are basically non-existent and I don't see anything in the code that seems to enable that functionality.

What is the best way to access the HTML of an element (and its children)?

You also could just parse all the wd.page_source with beautifulsoup — eLRuLL
– eLRuLL, Commented Mar 1, 2013 at 13:59

Peter Mortensen · Accepted Answer · 2020-12-09 17:21:00Z

1036

You can read the innerHTML attribute to get the source of the content of the element or outerHTML for the source with the current element.

Python:

element.get_attribute('innerHTML')

Java:

elem.getAttribute("innerHTML");

C#:

element.GetAttribute("innerHTML");

Ruby:

element.attribute("innerHTML")

JavaScript:

element.getAttribute('innerHTML');

PHP:

$element->getAttribute('innerHTML');

It was tested and worked with the ChromeDriver.

edited Dec 9, 2020 at 17:21

Peter Mortensen

31.5k22 gold badges110 silver badges134 bronze badges

answered Dec 20, 2011 at 12:49

Nerijus

10.5k1 gold badge16 silver badges2 bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Bibek Shrestha Over a year ago

innerHTML is a not DOM attribute. So above answer wouldn't work. innerHTML is a javascript javascript value. Doing above would return null. The answer by nilesh is the proper answer.

Ryan Shillington Over a year ago

This works great for me, and is much more elegant than the accepted answer. I'm using Selenium 2.24.1.

CuongHuyTo Over a year ago

Though innerHTML is not a DOM attribute, it is well supported by all major browsers (quirksmode.org/dom/w3c_html.html). It works also well for me.

Kelvin Over a year ago

+1 This appears to work in ruby also. I have a feeling that the getAttribute method (or equivalent in other languages) just calls the js method whose name is the arg. However the documentation doesn't explicitly say this, so nilesh's solution should be a fallback.

acdcjunior Over a year ago

This fails for HtmlUnitDriver. Works for ChromeDriver, FirefoxDriver, InternetExplorerDriver (IE10) and PhantomJSDriver (I haven't tested others).

|

Peter Mortensen · Accepted Answer · 2020-12-09 17:24:14Z

104

There is not really a straightforward way of getting the HTML source code of a webelement. You will have to use JavaScript. I am not too sure about python bindings, but you can easily do like this in Java. I am sure there must be something similar to JavascriptExecutor class in Python.

 WebElement element = driver.findElement(By.id("foo"));
 String contents = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].innerHTML;", element);

edited Dec 9, 2020 at 17:24

Peter Mortensen

31.5k22 gold badges110 silver badges134 bronze badges

answered Sep 3, 2011 at 3:29

nilesh

14.3k7 gold badges69 silver badges81 bronze badges

5 Comments

Chris W. Over a year ago

This is essentially what I ended up doing, albeit with the Python equivalent.

Ryan Shillington Over a year ago

I think the answer below, using element.getAttribute("innerHTML") is a lot easier to read. I don't understand why people are voting it down.

Anthon Over a year ago

No need to call javascript at all. In Python just use element.get_attribute('innerHTML')

nilesh Over a year ago

@Anthon innerHTMLis not a DOM attribute. When I answered this question in 2011, it did not work for me, looks like now some browsers are supporting it. If it works for you then using innerHTML is cleaner. However there is no guarantee it will work on all browsers.

Illidan Over a year ago

Apparently, this is the only way to get innerHTML while using RemoteWebDriver

Michael Mintz · Accepted Answer · 2023-06-27 22:59:53Z

104

Here's how to get the HTML source code using Selenium Python:

elem = driver.find_element("xpath", "//*")
source_code = elem.get_attribute("outerHTML")

Here's how to save that HTML to a file:

with open('c:/html_source_code.html', 'w') as f:
    f.write(source_code.encode('utf-8'))

edited Jun 27, 2023 at 22:59

Michael Mintz

16k9 gold badges51 silver badges109 bronze badges

answered Mar 20, 2013 at 18:08

Mark

1,1491 gold badge7 silver badges4 bronze badges

4 Comments

JohnDotOwl Over a year ago

Can I set a delay and get the latest source? There are dynamic contents loaded using javascript.

TheRookierLearner Over a year ago

Does this work even if the page is not fully loaded? Also, is there any way to set a delay like @FlyingAtom mentioned?

Parampreet Rai Over a year ago

If Webpage contain dynamic contents then it depends upon behavior of that webpage but 90%, you had to set delay before getting raw HTML from that page. And most simplest way is time.sleep(x) # Where x is seconds to set delay.

Victor Stafusa Over a year ago

This is an old answer. Nowadays, the method find_element_by_xpath no longer exists, and this gives AttributeError: 'WebDriver' object has no attribute 'find_element_by_xpath'. So, now, instead of driver.find_element_by_xpath("//*"), you should use driver .find_element("xpath", "//*"). Found that in this answer.

Ajinkya · Accepted Answer · 2014-10-24 19:11:35Z

15

In Ruby, using selenium-webdriver (2.32.1), there is a page_source method that contains the entire page source.

edited Oct 24, 2014 at 19:11

Ajinkya

22.7k33 gold badges113 silver badges164 bronze badges

answered Apr 15, 2013 at 20:59

John Alberts

2592 silver badges4 bronze badges

2 Comments

Nick Over a year ago

which is great, but if you need dynamically rendered content, then it won't help

Corey Goldberg Jun 6 at 17:36

@Nick page_source gives you the page source at the time you call it. If you wait for the dynamic content to render, it will include it.

undetected Selenium · Accepted Answer · 2020-12-09 19:12:20Z

The other answers provide a lot of details about retrieving the markup of a WebElement. However, an important aspect is, modern websites are increasingly implementing JavaScript, ReactJS, jQuery, Ajax, Vue.js, Ember.js, GWT, etc. to render the dynamic elements within the DOM tree. Hence there is a necessity to wait for the element and its children to completely render before retrieving the markup.

Python

Hence, ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

Using get_attribute("outerHTML"):

element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#my-id")))
print(element.get_attribute("outerHTML"))

Using execute_script():

element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#my-id")))
print(driver.execute_script("return arguments[0].outerHTML;", element))

Note: You have to add the following imports:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

An essential question is, what kind of HTML I get: 1) just the source tunneled through selenium or b) the source after chrome (depending on the driver also Safari or Firefox) rendered it?

Peter Mortensen · Accepted Answer · 2020-12-09 17:28:53Z

7

It looks outdated, but let it be here anyway. The correct way to do it in your case:

elem = wd.find_element_by_css_selector('#my-id')
html = wd.execute_script("return arguments[0].innerHTML;", elem)

or

html = elem.get_attribute('innerHTML')

Both are working for me (selenium-server-standalone-2.35.0).

edited Dec 9, 2020 at 17:28

Peter Mortensen

31.5k22 gold badges110 silver badges134 bronze badges

answered Mar 6, 2014 at 14:52

nefski

7196 silver badges12 bronze badges

Comments

Peter Mortensen · Accepted Answer · 2020-12-09 17:26:20Z

6

Using the attribute method is, in fact, easier and more straightforward.

Using Ruby with the Selenium and PageObject gems, to get the class associated with a certain element, the line would be element.attribute(Class).

The same concept applies if you wanted to get other attributes tied to the element. For example, if I wanted the string of an element, element.attribute(String).

edited Dec 9, 2020 at 17:26

Peter Mortensen

31.5k22 gold badges110 silver badges134 bronze badges

answered Mar 22, 2013 at 15:46

Tiffany G

2034 silver badges9 bronze badges

Comments

WltrRpo · Accepted Answer · 2016-03-29 21:25:03Z

4

Java with Selenium 2.53.0

driver.getPageSource();

answered Mar 29, 2016 at 21:25

WltrRpo

2632 silver badges13 bronze badges

3 Comments

Corey Goldberg Over a year ago

that's not what the question asked for

Stephan Over a year ago

Depending on the webdriver, the getPageSource method may not return the actual page source (ie with possible javascript changements). The returned source may be the raw source sent by the server. The webdriver doc must be checked to ensure this point.

wowandy Over a year ago

Also works for php - $driver->getPageSource()

Peter Mortensen · Accepted Answer · 2020-12-09 17:31:46Z

InnerHTML will return the element inside the selected element and outerHTML will return the inside HTML along with the element you have selected

Example:

Now suppose your Element is as below

<tr id="myRow"><td>A</td><td>B</td></tr>

innerHTML element output

<td>A</td><td>B</td>

outerHTML element output

<tr id="myRow"><td>A</td><td>B</td></tr>

Live Example:

http://www.java2s.com/Tutorials/JavascriptDemo/f/find_out_the_difference_between_innerhtml_and_outerhtml_in_javascript_example.htm

Below you will find the syntax which require as per different binding. Change the innerHTML to outerHTML as per required.

Python:

element.get_attribute('innerHTML')

Java:

elem.getAttribute("innerHTML");

If you want whole page HTML, use the below code:

driver.getPageSource();

pr96 · Accepted Answer · 2022-12-15 10:51:06Z

Updated 2022 Selenium Retrieving HTML

To start with, download the Python bindings for Selenium WebDriver.

One can do this from the PyPI page for the Selenium package.
Alternatively, one can use pip to install the Selenium package. Python 3.6 provides the pip in the standard library.

Method 1

Read the innerHTML attribute to get the source of the element’s content. innerHTML is a property of a DOM element whose value is the HTML between the opening tag and ending tag.

For example, the innerHTML property in the code below carries the value “text”

<p>
a text
</p>

element.get_attribute('innerHTML')

Method 2

Read the outerHTML to get the source with the current element. outerHTML is an element property whose value is the HTML between the opening and closing tags and the HTML of the selected element itself.

For example, the code’s outerHTML property carries a value that contains div and span inside that.

<div>
<span>Hello there!</span>
</div>

ele.get_atrribute("outerHTML")

ballade4op52 · Accepted Answer · 2016-04-07 23:09:29Z

2

I hope this could help: http://selenium.googlecode.com/svn/trunk/docs/api/java/org/openqa/selenium/WebElement.html

Here is described Java method:

java.lang.String    getText()

But unfortunately it's not available in Python. So you can translate the method names to Python from Java and try another logic using present methods without getting the whole page source...

E.g.

 my_id = elem[0].get_attribute('my-id')

edited Apr 7, 2016 at 23:09

ballade4op52

2,2675 gold badges29 silver badges44 bronze badges

answered Sep 7, 2011 at 14:23

oleksii.burdin

534 bronze badges

3 Comments

Chris W. Over a year ago

Python actually does have a "gettext" equivalent (I think its just the "text" attribute?) but that actually just returns the "plaintext" between HTML tags and won't actually return the full HTML source.

Ryan Shillington Over a year ago

This returns only the plain text (not the html) in Java too.

HelloW Over a year ago

you must reference it like you said elem[0] otherwise it doesn't work

MaartenDev · Accepted Answer · 2019-09-22 16:05:22Z

2

This works seamlessly for me.

element.get_attribute('innerHTML')

edited Sep 22, 2019 at 16:05

MaartenDev

5,8205 gold badges23 silver badges36 bronze badges

answered Sep 22, 2019 at 15:26

Jitendra Pisal

1316 bronze badges

Comments

Peter Mortensen · Accepted Answer · 2020-12-09 17:39:08Z

2

The method to get the rendered HTML I prefer is the following:

driver.get("http://www.google.com")
body_html = driver.find_element_by_xpath("/html/body")
print body_html.text

However, the above method removes all the tags (yes, the nested tags as well) and returns only text content. If you interested in getting the HTML markup as well, then use the method below.

print body_html.getAttribute("innerHTML")

edited Dec 9, 2020 at 17:39

Peter Mortensen

31.5k22 gold badges110 silver badges134 bronze badges

answered Feb 4, 2018 at 17:32

Rusty

4,5115 gold badges40 silver badges52 bronze badges

2 Comments

Rusty Over a year ago

You can also use driver.find_element_by_tag("body") to reach the body content of the page.

MT1 Over a year ago

This works in Excel VBA with Selenium but needs some adjustment.

Peter Mortensen · Accepted Answer · 2020-12-09 17:27:04Z

0

If you are interested in a solution for Selenium Remote Control in Python, here is how to get innerHTML:

innerHTML = sel.get_eval("window.document.getElementById('prodid').innerHTML")

edited Dec 9, 2020 at 17:27

Peter Mortensen

31.5k22 gold badges110 silver badges134 bronze badges

answered Jul 9, 2013 at 14:18

StanleyD

2,36824 silver badges21 bronze badges

1 Comment

Shane Over a year ago

Thanks for the help, I have used this. I also find innerHTML = {solenium selector code}.text works just the same.

user2849367 · Accepted Answer · 2021-09-11 02:49:56Z

0

Use execute_script get html

bs4(BeautifulSoup) also can access html tag quickly.

from bs4 import BeautifulSoup
html = adriver.execute_script("return document.documentElement.outerHTML")
bs4_onepage_object=BeautifulSoup(html,"html.parser")
bs4_div_object=bs4_onepage_object.find_all("atag",class_="attribute")

answered Sep 11, 2021 at 2:49

user2849367

581 silver badge8 bronze badges

Comments

wowandy · Accepted Answer · 2021-12-22 07:51:44Z

0

In PHP Selenium WebDriver you can get page source like this:

$html = $driver->getPageSource();

Or get HTML of the element like this:

// innerHTML if you need HTML of the element content
$html = $element->getDomProperty('outerHTML');

answered Dec 22, 2021 at 7:51

wowandy

1,3222 gold badges16 silver badges27 bronze badges

3 Comments

Laurent Over a year ago

Question is about Python not PHP

wowandy Over a year ago

@Laurent I know i can read but google search for php returns this page

wowandy Over a year ago

@Laurent this answer has upvotes, which means it was helpful to someone

christian · Accepted Answer · 2023-01-05 13:48:10Z

0

In current versions of php-webdriver (1.12.0+) you have to use

$element->getDomProperty('innerHTML');

as pointed out in this issue: https://github.com/php-webdriver/php-webdriver/issues/929

edited Jan 5, 2023 at 13:48

answered Oct 25, 2021 at 12:10

christian

1801 silver badge7 bronze badges

2 Comments

Laurent Over a year ago

Why an answer using PHP when the question is specifically about Python?

wowandy Over a year ago

@Laurent I answered you above. The reason is that for similar requests for PHP, Google issues this page

Dima Tisnek · Accepted Answer · 2020-07-09 05:03:20Z

-1

WebElement element = driver.findElement(By.id("foo"));
String contents = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].innerHTML;", element);

This code really works to get JavaScript from source as well!

edited Jul 9, 2020 at 5:03

Dima Tisnek

11.9k4 gold badges73 silver badges129 bronze badges

answered Aug 31, 2012 at 4:04

Ilya

9

Comments

Peter Mortensen · Accepted Answer · 2020-12-09 17:29:41Z

-1

And in PHPUnit Selenium test it's like this:

$text = $this->byCssSelector('.some-class-nmae')->attribute('innerHTML');

edited Dec 9, 2020 at 17:29

Peter Mortensen

31.5k22 gold badges110 silver badges134 bronze badges

answered May 30, 2014 at 10:25

Zorgijs

27

1 Comment

Laurent Over a year ago

Question is about Python not PHP

Collectives™ on Stack Overflow

Get HTML source of WebElement in Selenium WebDriver using Python

19 Answers 19

11 Comments

5 Comments

4 Comments

2 Comments

Python

1 Comment

Comments

Comments

3 Comments

innerHTML element output

outerHTML element output

Comments

Updated 2022 Selenium Retrieving HTML

Method 1

Method 2

Comments

3 Comments

Comments

2 Comments

1 Comment

Comments

3 Comments

2 Comments

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

19 Answers 19

11 Comments

5 Comments

4 Comments

2 Comments

Python

1 Comment

Comments

Comments

3 Comments

innerHTML element output

outerHTML element output

Comments

Updated 2022 Selenium Retrieving HTML

Method 1

Method 2

Comments

3 Comments

Comments

2 Comments

1 Comment

Comments

3 Comments

2 Comments

Comments

1 Comment

Linked

Related