python xml extraction for loop

Question

I have a bit of script that I think is nearly there. I have worked out a crude way of writing it, but I can't work out how to get it to function as a for loop.

I am extracting data from an xml file that uses the following format:

<Trackpoint>
    <Time>2012-01-17T11:44:35Z</Time>
    <Position>
        <LatitudeDegrees>51.920211518183351</LatitudeDegrees>
        <LongitudeDegrees>26.706042898818851</LongitudeDegrees>
    </Position>
    <AltitudeMeters>-43.6026611328125</AltitudeMeters>
</Trackpoint>
<Trackpoint>
    <Time>2012-01-17T11:45:21Z</Time>
    <Position>
        <LatitudeDegrees>51.920243117958307</LatitudeDegrees>
        <LongitudeDegrees>26.706140967085958</LongitudeDegrees>
    </Position>
    <AltitudeMeters>-43.6026611328125</AltitudeMeters>
</Trackpoint>

I can use the following to get say the LatitudeDegrees:

from xml.dom.minidom import parse
doc = parse('/Users/name/Documents/GPS/gps.tcx')
lat = doc.getElementsByTagName("LatitudeDegrees")
time = doc.getElementsByTagName("Time")
trackpoint = doc.getElementsByTagName("Trackpoint")

for x in lat:
    print(x.firstChild.data)

but I would like to get the Lat, Long and time in order.

I am guessing I need to use

for x in trackpoint

but the only way I can work out how to do that is as follows.

count = 0
n = len(trackpoint)
while count < n:
    print(time[count].firstChild.data)
    print(lat[count].firstChild.data)
    print(lon[count].firstChild.data)
    count += 1

anyone have any ideas? I think I am just missing something really simple!

Rob Wouters · Accepted Answer · 2012-01-18 22:57:19Z

4

First find all the Trackpoint elements and loop over them. Then inside the loop find the wanted childelements of each Trackpoint element:

from xml.dom.minidom import parse

doc = parse('in.tcx')

trackpoints = doc.getElementsByTagName("Trackpoint")
result = []
elements = ('Time', 'LatitudeDegrees', 'LongitudeDegrees')
for tp in trackpoints:
    obj = {}
    for el in elements:
        obj[el] = tp.getElementsByTagName(el)[0].firstChild.data
    result.append(obj)


print(result)

answered Jan 18, 2012 at 22:57

Rob Wouters

16.4k3 gold badges44 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

beoliver Over a year ago

is result a list and obj a dictionary?

Dan D. Over a year ago

Yes, it's [{'Time':,'LatitudeDegrees':,'LongitudeDegrees':}]

Rob Wouters Over a year ago

@user969617, the end result is a list of dictionaries. You can print the result directly by changing the obj[el] = line. But it's more flexible to keep this format and then create a separate function that outputs it.

beoliver Over a year ago

Ok, as I have been parsing .plist files using plistlib and they are then read as a lib with dictionaries. If I want to save the output do I use pickle? or is this a different question that need posting?

Rob Wouters Over a year ago

@user969617, you can use pickle for that. It might indeed be better suited in a separate question.

Anurag Uniyal · Accepted Answer · 2012-01-18 23:17:09Z

I usually found parsing xml using ElementTree more readable and easier e.g. you can read latitude in three lines

import xml.etree.ElementTree as etree

s="""<root>
<Trackpoint>
    <Time>2012-01-17T11:44:35Z</Time>
    <Position>
        <LatitudeDegrees>51.920211518183351</LatitudeDegrees>
        <LongitudeDegrees>26.706042898818851</LongitudeDegrees>
    </Position>
    <AltitudeMeters>-43.6026611328125</AltitudeMeters>
</Trackpoint>
<Trackpoint>
    <Time>2012-01-17T11:45:21Z</Time>
    <Position>
        <LatitudeDegrees>51.920243117958307</LatitudeDegrees>
        <LongitudeDegrees>26.706140967085958</LongitudeDegrees>
    </Position>
    <AltitudeMeters>-43.6026611328125</AltitudeMeters>
</Trackpoint>
</root>
"""

root = etree.fromstring(s)
for point in root:
    print point.find('Position/LatitudeDegrees').text

so suppose you want to convert each point to a dict

varnames = [
    ('Position/LatitudeDegrees', 'lat'),
    ('Position/LongitudeDegrees', 'lon'),
    ('Time', 'time'),
    ('AltitudeMeters', 'alt')
    ]

points = []
for pointelem in etree.fromstring(s):
    point = {}
    for tag, varname in varnames:
        point[varname] = pointelem.find(tag).text
    points.append(point)

import pprint
pprint.pprint(points)

output:

[{'alt': '-43.6026611328125',
  'lat': '51.920211518183351',
  'lon': '26.706042898818851',
  'time': '2012-01-17T11:44:35Z'},
 {'alt': '-43.6026611328125',
  'lat': '51.920243117958307',
  'lon': '26.706140967085958',
  'time': '2012-01-17T11:45:21Z'}]

from a file would I use s = "/Users/name/Documents/GPS/gps.tcx" ?
@user969617 if you have file you can directly use etree.parse docs.python.org/library/…

unutbu · Accepted Answer · 2012-01-18 23:01:30Z

0

Perhaps you are looking for zip:

import xml.dom.minidom as minidom
import os

doc = minidom.parse(os.path.expanduser('~/test/gps.tcx'))
latitudes = doc.getElementsByTagName("LatitudeDegrees")
longitudes = doc.getElementsByTagName("LongitudeDegrees")
time = doc.getElementsByTagName("Time")
trackpoint = doc.getElementsByTagName("Trackpoint")

for t,lat,lon in zip(time,latitudes,longitudes):
    print(t.firstChild.data, lat.firstChild.data, lon.firstChild.data)

answered Jan 18, 2012 at 23:01

unutbu

886k197 gold badges1.9k silver badges1.7k bronze badges

1 Comment

beoliver Over a year ago

To be honest I don't know what I need exactly. I want to be able to save the output and then compare and merge them with different data from a .plist. Ill read up on zip as it look interesting.

Collectives™ on Stack Overflow

python xml extraction for loop

3 Answers 3

5 Comments

2 Comments

1 Comment

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

5 Comments

2 Comments

1 Comment

Related