I assume your Ubuntu has python installed
#!/usr/bin/python3
import io
import json
import xml.etree.ElementTree
d = """<quiz>
<que>The question her</que>
<ca>text</ca>
<ia>text</ia>
<ia>text</ia>
<ia>text</ia>
</quiz>
"""
s = io.StringIO(d)
# root = xml.etree.ElementTree.parse("filename_here").getroot()
root = xml.etree.ElementTree.parse(s).getroot()
out = {}
i = 1
for child in root:
name, value = child.tag, child.text
if name == 'que':
name = 'question'
else:
name = 'answer%s' % i
i += 1
out[name] = value
print(json.dumps(out))
save it and chmod to executable
you can easily modify to take a file as input instead of just text
EDIT
Okey, this is a more complete script:
#!/usr/bin/python3
import json
import sys
import xml.etree.ElementTree
def read_file(filename):
root = xml.etree.ElementTree.parse(filename).getroot()
return root
# assule we have a list of <quiz>, contained in some other element
def parse_quiz(quiz_element, out):
i = 1
tmp = {}
for child in quiz_element:
name, value = child.tag, child.text
if name == 'que':
name = 'question'
else:
name = 'answer%s' % i
i += 1
tmp[name] = value
out.append(tmp)
def parse_root(root_element, out):
for child in root_element:
if child.tag == 'quiz':
parse_quiz(child, out)
def convert_xml_to_json(filename):
root = read_file(filename)
out = []
parse_root(root, out)
print(json.dumps(out))
if __name__ == '__main__':
if len(sys.argv) > 1:
convert_xml_to_json(sys.argv[1])
else:
print("Usage: script <filename_with_xml>")
I made a file with following, which I named xmltest:
<questions>
<quiz>
<que>The question her</que>
<ca>text</ca>
<ia>text</ia>
<ia>text</ia>
<ia>text</ia>
</quiz>
<quiz>
<que>Question number 1</que>
<ca>blabla</ca>
<ia>stuff</ia>
</quiz>
</questions>
So you have a list of quiz inside some other container.
Now, I launch it like this:
$ chmod u+x scratch.py, then scratch.py filenamewithxml
This gives me the answer:
$ ./scratch4.py xmltest
[{"answer3": "text", "answer2": "text", "question": "The question her", "answer4": "text", "answer1": "text"}, {"answer2": "stuff", "question": "Question number 1", "answer1": "blabla"}]
<quiz>and</quiz>tags always on their own on a line. Could the text have some encoding (CDATA,{,é....), could it contain double quotes or backslashes? Is the XML file encoded in UTF-8?