1

I have one xml file. Its looks like,

<root>
  <Group>    
    <ChapterNo>1</ChapterNo>    
    <ChapterName>A</ChapterName>    
    <Line>1</Line>    
    <Content>zfsdfsdf</Content>    
    <Synonyms>fdgd</Synonyms>    
    <Translation>assdfsdfsdf</Translation>    
  </Group>    
  <Group>    
    <ChapterNo>1</ChapterNo>    
    <ChapterName>A</ChapterName>    
    <Line>2</Line>    
    <Content>ertreter</Content>    
    <Synonyms>retreter</Synonyms>    
    <Translation>erterte</Translation>    
  </Group>    
  <Group>    
    <ChapterNo>2</ChapterNo>    
    <ChapterName>B</ChapterName>    
    <Line>1</Line>    
    <Content>sadsafs</Content>
    <Synonyms>sdfsdfsd</Synonyms>
    <Translation>sdfsdfsd</Translation>
  </Group>
  <Group>
    <ChapterNo>2</ChapterNo>
    <ChapterName>B</ChapterName>
    <Line>2</Line>
    <Content>retete</Content>
    <Synonyms>retertret</Synonyms>
    <Translation>retertert</Translation>
  </Group>
</root>

I tried in this way.......

root = ElementTree.parse('data.xml').getroot()
ChapterNo = root.find('ChapterNo').text 
ChapterName = root.find('ChapterName').text 
GitaLine = root.find('Line').text 
Content = root.find('Content').text 
Synonyms = root.find('Synonyms').text 
Translation = root.find('Translation').text

But it shows an error

ChapterNo=root.find('ChapterNo').text 
AttributeError: 'NoneType' object has no attribute 'text'`

Now i want to get the all ChapterNo,ChapterName, etc are separately using element tree and I want to insert these dats into the database.... Any one can help me?

Rgds,

Nimmy

3
  • i tried......... root = ElementTree.parse('data.xml').getroot() ChapterNo=root.find('ChapterNo').text ChapterName=root.find('ChapterName').text GitaLine=root.find('Line').text Content=root.find('Content').text Synonyms=root.find('Synonyms').text Translation=root.find('Translation').text But is shows an error "ChapterNo=root.find('ChapterNo').text AttributeError: 'NoneType' object has no attribute 'text'" Commented Feb 1, 2011 at 10:02
  • Add that into your question, its' hard to read in a comment. Commented Feb 1, 2011 at 10:03
  • root.find('GitaLine') There is no text "GitaLine" in your example. Commented Feb 1, 2011 at 10:04

3 Answers 3

2

To parse your simple two-level data structure and assemble a dict for each group, all you need to do is this:

>>> # what you did to get `root`
>>> from pprint import pprint as pp
>>> for group in root:
...     d = {}
...     for elem in group:
...         d[elem.tag] = elem.text
...     pp(d) # or whack it ito a database
...
{'ChapterName': 'A',
 'ChapterNo': '1',
 'Content': 'zfsdfsdf',
 'Line': '1',
 'Synonyms': 'fdgd',
 'Translation': 'assdfsdfsdf'}
{'ChapterName': 'A',
 'ChapterNo': '1',
 'Content': 'ertreter',
 'Line': '2',
 'Synonyms': 'retreter',
 'Translation': 'erterte'}
{'ChapterName': 'B',
 'ChapterNo': '2',
 'Content': 'sadsafs',
 'Line': '1',
 'Synonyms': 'sdfsdfsd',
 'Translation': 'sdfsdfsd'}
{'ChapterName': 'B',
 'ChapterNo': '2',
 'Content': 'retete',
 'Line': '2',
 'Synonyms': 'retertret',
 'Translation': 'retertert'}
>>>

Look, Ma, no xpath!

Sign up to request clarification or add additional context in comments.

Comments

1

ChapterNo is not a direct child of root, so root.find('ChapterNo') won't work. You'll need to use xpath syntax to find the data.

Also, there are multiple occurrences of ChapterNo, ChapterName, etc, so you should use findall and iterate through the results to get the text for each one.

chapter_nos = [e.text for e in root.findall('.//ChapterNo')]

and so on.

1 Comment

Note that on a large XML document, /root/Group/ChapterNo will be faster than //ChapterNo.
0

Here's a small example using sqlalchemy to define a object that will extract and store the data in a sqlite database.

from sqlalchemy import create_engine, Unicode, Integer, Column, UnicodeText
from sqlalchemy.orm import create_session
from sqlalchemy.ext.declarative import declarative_base

engine = create_engine('sqlite:///chapters.sqlite', echo=True)
Base = declarative_base(bind=engine)

class ChapterLine(Base):
    __tablename__ = 'chapterlines'
    chapter_no = Column(Integer, primary_key=True)
    chapter_name = Column(Unicode(200))
    line = Column(Integer, primary_key=True)
    content = Column(UnicodeText)
    synonyms = Column(UnicodeText)
    translation = Column(UnicodeText)

    @classmethod
    def from_xmlgroup(cls, element):
        l = cls()
        l.chapter_no = int(element.find('ChapterNo').text)
        l.chapter_name = element.find('ChapterName').text
        l.line = int(element.find('Line').text)
        l.content = element.find('Content').text
        l.synonyms = element.find('Synonyms').text
        l.translation = element.find('Translation').text
        return l

Base.metadata.create_all() # creates the table

Here's how to use it:

from xml.etree import ElementTree as etree

session = create_session(bind=engine, autocommit=False)
doc = etree.parse('myfile.xml').getroot()
for group in doc.findall('Group'):
    l = ChapterLine.from_xmlgroup(group)
    session.add(l)

session.commit()

I have tested this code in your xml data and it works fine, inserting everything into the database.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.