0

I want to make a command line program that will hopefully work on both Windows and Linux. I want to use Python since it is my programming language of choice. The goal is the make the program take file name as an argument, and output the information from the file in a different format. In this case XML -> CSV and CSV -> XML.

What is the best way to do this?

I know there are XML and CSV parsers in Python, like xml.parsers.expat and csv libraries. I want the program to be robust, so that perhaps it could output in other formats, like .sql or something. Would be beneficiary to convert the data into a standard format first? Like JSON? Then the output file could be made into other formats in necessary.

Thanks.

EDIT::

<level1 id ='' attr1='' attr2=''>
    <level2 id ='' attr1='' attr2=''>
        <type1 id ='' attr1='' attr2=''>
        </type1>
        <type2 id ='' attr1='' attr2=''>
        </type2>
    </level2>
    <level2 id ='' attr1='' attr2=''>
        <type2 id ='' attr1='' attr2=''>
        </type2>
    </level2>
</level1>

This is the XML format. Notice the type1 and type2 inside the level2. Now how should I represent this line by line in a csv?

EDIT #2:

I guess this question comes down to standard way to convert between a tree-like data structure and a grid structure. I ended up making a nested list in python, like JSON but didn't used the JSON structure. I wonder if there is a good algorithm in general for making this conversion?

5
  • XML is a very flexible format, so how to do it really depends on the format of your XML, and what information you want from it. Commented Nov 19, 2011 at 18:55
  • 1
    You should probably articulate more what kind of files you are expecting to process. For example: XML is used to represent trees, while CSV tables... Are your files going to contain trees with branches of the same length or... ? Commented Nov 19, 2011 at 18:55
  • Yes the differences between csv and xml files makes it a little tricky. The xml file is in tree format, see my edit above for the xml and csv formats. Commented Nov 19, 2011 at 22:11
  • </level1 id ='' attr1='' attr2=''> is not a valid xml. Commented Nov 19, 2011 at 23:27
  • Thanks, yes that was my typo. Commented Nov 20, 2011 at 0:58

2 Answers 2

1

You would be better-off converting XML to JSON and back. Both formats support multiple layers of nesting. In contrast, CSV is suited for a list of rows with no additional nesting.

Sign up to request clarification or add additional context in comments.

3 Comments

Yes that's what I was thinking. So there isn't any kind of accepted format for saving nested trees from XML as csv?
@jeffery_the_wind Nope, you'll need to decide how you want to flatten the information yourself.
That is what I did, on a per-project basis. I think for a general solution you could use the uniqueness of the entries in the grid format to populate a tree. Groups of unique values in the grid could be branches of the trees.
0

You should just convert the data into standard python dict, and then from that to any format you need.

Of course, to convert xml to csv, you should have specially formatted xml, like

<root>
    <column1>value</column1>
    <column2>value</column2>
    <column3>value</column3>
</root>

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.