2

I need to simplify data in an XML to be able to read it as a single table, thus a csv. I found some Python 2.7 examples with ElementTree, but so far I could not tailor it to work further down the tree, thus not just collecting the highest-level elements. But repeat the highest level element for each of their rows and get the rest.

I know I could and should RTFM, but I would need to solve the problem ASAP sadly.

Maybe the xsd file linked could help?

My data looks like

<!-- MoneyMate (tm) XMLPerfs Application version 1.0.1.1 - Copyright © 2000 MoneyMate Limited. All Rights Reserved. MoneyMate ® -->
<!-- Discrete Perfs for 180 periods for Monthly frequency -->
<MONEYMATE_XML_FEED xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://mmia2.moneymate.com/xml/MoneyMateComplete.xsd" version="1.0" calcCurrency="SEK">
<TYPES>
<TYPE typeCountry="SE" typeId="85" typeName="string" calcToDate="2013-07-16">
<COMPANIES>
<COMPANY companyId="25000068" companyName="string"/>
…

<CATEGORIES>
<CATEGORY categoryId="1101" categoryName="Aktie -- Asien">
<FUNDS>
<FUND fundId="6201" fundName="string" fundCurrency="GBP" fundCompanyId="25000068"><PERFORMANCES><MONTHLYPERFS><PERFORMANCEMONTH perfEndMonth="2006-05-31" perfMonth="-0.087670"/><PERFORMANCEMONTH>
…
</PERFORMANCES></FUND></FUNDS>
</CATEGORY>
<CATEGORY categoryId="13" categoryName="Räntefonder">
<FUNDS></FUNDS>
</CATEGORY>
</CATEGORIES>
</TYPE>
</TYPES>
</MONEYMATE_XML_FEED>

So I hope to see a table with data from FUNDS only, but:

fundid   fundName   fundCurrency   fundCompanyId   perfEndMonth   perfMonth
…        …          …              …               …              …

etc.

And in a csv file, I just did not want to break the formatting.

And please note perfMonth is key, the code just did not wrap in the box above with the data example.

0

1 Answer 1

1

I used lxml.

import csv

import lxml.etree

x = u'''<!-- MoneyMate (tm) XMLPerfs Application version 1.0.1.1 - Copyright 2000 MoneyMate Limited. All Rights Reserved. MoneyMate -->
<!-- Discrete Perfs for 180 periods for Monthly frequency -->
<MONEYMATE_XML_FEED xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://mmia2.moneymate.com/xml/MoneyMateComplete.xsd" version="1.0" calcCurrency="SEK">
    <TYPES>
        <TYPE typeCountry="SE" typeId="85" typeName="string" calcToDate="2013-07-16">
            <COMPANIES>
                <COMPANY companyId="25000068" companyName="string"/>
                <CATEGORIES>
                    <CATEGORY categoryId="1101" categoryName="Aktie -- Asien">
                        <FUNDS>
                            <FUND fundId="6201" fundName="string" fundCurrency="GBP" fundCompanyId="25000068">
                                <PERFORMANCES>
                                    <MONTHLYPERFS>
                                        <PERFORMANCEMONTH perfEndMonth="2006-05-31" perfMonth="-0.087670"/>
                                    </MONTHLYPERFS>
                                </PERFORMANCES>
                            </FUND>
                        </FUNDS>
                    </CATEGORY>
                    <CATEGORY categoryId="13" categoryName="Rntefonder">
                        <FUNDS></FUNDS>
                    </CATEGORY>
                </CATEGORIES>
            </COMPANIES>
        </TYPE>
    </TYPES>
</MONEYMATE_XML_FEED>
'''

with open('output.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerow(('fundid', 'fundName', 'fundCurrency', 'fundCompanyId', 'perfEndMonth', 'perfMonth'))
    root = lxml.etree.fromstring(x)
    for fund in root.iter('FUND'):
        perf = fund.find('.//PERFORMANCEMONTH')
        row = fund.get('fundId'), fund.get('fundName'), fund.get('fundCurrency'), fund.get('fundCompanyId'), perf.get('perfEndMonth'), perf.get('perfMonth')
        writer.writerow(row)

NOTE

Given xml in the question has a mismatched tag. You may need to fix that first.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, @falsetru. Sadly I cannot have lxml where I need to get this done, but perhaps the general idea still applies.
@László, You can also use xml.etree.ElementTree because I didn't use lxml specific function here.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.