1

I have a bit of script that I think is nearly there. I have worked out a crude way of writing it, but I can't work out how to get it to function as a for loop.

I am extracting data from an xml file that uses the following format:

<Trackpoint>
    <Time>2012-01-17T11:44:35Z</Time>
    <Position>
        <LatitudeDegrees>51.920211518183351</LatitudeDegrees>
        <LongitudeDegrees>26.706042898818851</LongitudeDegrees>
    </Position>
    <AltitudeMeters>-43.6026611328125</AltitudeMeters>
</Trackpoint>
<Trackpoint>
    <Time>2012-01-17T11:45:21Z</Time>
    <Position>
        <LatitudeDegrees>51.920243117958307</LatitudeDegrees>
        <LongitudeDegrees>26.706140967085958</LongitudeDegrees>
    </Position>
    <AltitudeMeters>-43.6026611328125</AltitudeMeters>
</Trackpoint>

I can use the following to get say the LatitudeDegrees:

from xml.dom.minidom import parse
doc = parse('/Users/name/Documents/GPS/gps.tcx')
lat = doc.getElementsByTagName("LatitudeDegrees")
time = doc.getElementsByTagName("Time")
trackpoint = doc.getElementsByTagName("Trackpoint")

for x in lat:
    print(x.firstChild.data)

but I would like to get the Lat, Long and time in order.

I am guessing I need to use

for x in trackpoint 

but the only way I can work out how to do that is as follows.

count = 0
n = len(trackpoint)
while count < n:
    print(time[count].firstChild.data)
    print(lat[count].firstChild.data)
    print(lon[count].firstChild.data)
    count += 1

anyone have any ideas? I think I am just missing something really simple!

3 Answers 3

4

First find all the Trackpoint elements and loop over them. Then inside the loop find the wanted childelements of each Trackpoint element:

from xml.dom.minidom import parse

doc = parse('in.tcx')

trackpoints = doc.getElementsByTagName("Trackpoint")
result = []
elements = ('Time', 'LatitudeDegrees', 'LongitudeDegrees')
for tp in trackpoints:
    obj = {}
    for el in elements:
        obj[el] = tp.getElementsByTagName(el)[0].firstChild.data
    result.append(obj)


print(result)
Sign up to request clarification or add additional context in comments.

5 Comments

is result a list and obj a dictionary?
Yes, it's [{'Time':,'LatitudeDegrees':,'LongitudeDegrees':}]
@user969617, the end result is a list of dictionaries. You can print the result directly by changing the obj[el] = line. But it's more flexible to keep this format and then create a separate function that outputs it.
Ok, as I have been parsing .plist files using plistlib and they are then read as a lib with dictionaries. If I want to save the output do I use pickle? or is this a different question that need posting?
@user969617, you can use pickle for that. It might indeed be better suited in a separate question.
2

I usually found parsing xml using ElementTree more readable and easier e.g. you can read latitude in three lines

import xml.etree.ElementTree as etree

s="""<root>
<Trackpoint>
    <Time>2012-01-17T11:44:35Z</Time>
    <Position>
        <LatitudeDegrees>51.920211518183351</LatitudeDegrees>
        <LongitudeDegrees>26.706042898818851</LongitudeDegrees>
    </Position>
    <AltitudeMeters>-43.6026611328125</AltitudeMeters>
</Trackpoint>
<Trackpoint>
    <Time>2012-01-17T11:45:21Z</Time>
    <Position>
        <LatitudeDegrees>51.920243117958307</LatitudeDegrees>
        <LongitudeDegrees>26.706140967085958</LongitudeDegrees>
    </Position>
    <AltitudeMeters>-43.6026611328125</AltitudeMeters>
</Trackpoint>
</root>
"""

root = etree.fromstring(s)
for point in root:
    print point.find('Position/LatitudeDegrees').text

so suppose you want to convert each point to a dict

varnames = [
    ('Position/LatitudeDegrees', 'lat'),
    ('Position/LongitudeDegrees', 'lon'),
    ('Time', 'time'),
    ('AltitudeMeters', 'alt')
    ]

points = []
for pointelem in etree.fromstring(s):
    point = {}
    for tag, varname in varnames:
        point[varname] = pointelem.find(tag).text
    points.append(point)

import pprint
pprint.pprint(points)

output:

[{'alt': '-43.6026611328125',
  'lat': '51.920211518183351',
  'lon': '26.706042898818851',
  'time': '2012-01-17T11:44:35Z'},
 {'alt': '-43.6026611328125',
  'lat': '51.920243117958307',
  'lon': '26.706140967085958',
  'time': '2012-01-17T11:45:21Z'}]

2 Comments

from a file would I use s = "/Users/name/Documents/GPS/gps.tcx" ?
@user969617 if you have file you can directly use etree.parse docs.python.org/library/…
0

Perhaps you are looking for zip:

import xml.dom.minidom as minidom
import os

doc = minidom.parse(os.path.expanduser('~/test/gps.tcx'))
latitudes = doc.getElementsByTagName("LatitudeDegrees")
longitudes = doc.getElementsByTagName("LongitudeDegrees")
time = doc.getElementsByTagName("Time")
trackpoint = doc.getElementsByTagName("Trackpoint")

for t,lat,lon in zip(time,latitudes,longitudes):
    print(t.firstChild.data, lat.firstChild.data, lon.firstChild.data)

1 Comment

To be honest I don't know what I need exactly. I want to be able to save the output and then compare and merge them with different data from a .plist. Ill read up on zip as it look interesting.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.