0

I am trying to parse xml data received from RESTful interface. In error conditions (when query does not result anything on the server), I am returned the following text. Now, I want to parse this string to search for the value of status present in the fifth line in example given below. How can I find if the status is present or not and if it is present then what is its value.

content = """
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="/3.0/style/exchange.xsl"?>
<ops:world-patent-data xmlns="http://www.epo.org/exchange" xmlns:ops="http://ops.epo.org" xmlns:xlink="http://www.w3.org/1999/xlink">
    <ops:meta name="elapsed-time" value="3"/>
    <exchange-documents>
        <exchange-document system="ops.epo.org" country="US" doc-number="20060159695" status="not found">
            <bibliographic-data>
                <publication-reference>
                    <document-id document-id-type="epodoc">
                        <doc-number>US20060159695</doc-number>
                    </document-id>
                </publication-reference>
                <parties/>
            </bibliographic-data>
        </exchange-document>
    </exchange-documents>
</ops:world-patent-data>
"""
import xml.etree.ElementTree as ET
root = ET.fromstring(content)
res = root.iterfind(".//{http://www.epo.org/exchange}exchange-documents[@status='not found']/..")

2 Answers 2

1

Just use BeautifulSoup:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(open('xml.txt', 'r'))

print soup.findAll('exchange-document')["status"]

#> not found 

If you store every xml output in a single file, would be useful to iterate them:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(open('xml.txt', 'r'))

for tag in soup.findAll('exchange-document'):
    print tag["status"]

#> not found

This will display every [status] tag from [exchange-document] element.

Plus, if you want only useful status you should do:

for tag in soup.findAll('exchange-document'):
    if tag["status"] not in "not found":
        print tag["status"]
Sign up to request clarification or add additional context in comments.

Comments

0

Try this:

from xml.dom.minidom import parse
xmldoc = parse(filename)
elementList = xmldoc.getElementsByTagName(tagName)

elementList will contain all elements with the tag name you specify, then you can iterate over those.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.