I am very new in coding in Python, and there is an issue I have been trying to solve for some hours:
I have 1600+ xml files (0000.xml, 0001.xml, etc) need to be parsed in order to do a text mining project.
But an error has occurred, when I have the following code:
from os import listdir, path
import xml.etree.ElementTree as ET
mypath = '../project/content'
files = [f for f in listdir(mypath) if f.endswith('.xml')]
for file in files:
tree = ET.parse("../project/content/"+file)
root = tree.getroot()
The error message is the following:
Traceback (most recent call last):
File "/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2910, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-13-cdc3ee6c3989>", line 6, in <module>
tree = ET.parse("../project/content/"+file)
File "/anaconda3/lib/python3.6/xml/etree/ElementTree.py", line 1196, in parse
tree.parse(source, parser)
File "/anaconda3/lib/python3.6/xml/etree/ElementTree.py", line 597, in parse
self._root = parser._parse_whole(source)
File "<string>", line unknown ParseError: no element found: line 1, column 0
where did I make mistakes?
Also, I want to only extract the text from one element of each xml files, is it sufficient that I simply attach this line to the code? and moreover, how can I save each of the results to txt files?
maintext = root.find("mainText").text
Thank you very much!
ParseError: no element found: line 1, column 0) , you will find many SO Q&A's which may point you in the right direction. It is possible that the file it is trying to parse is malformed or maybe even an empty file. If you want to just skip those, Catch the error and maybe just print the filename in the except suite then you can look at them later....is it sufficient that I simply attach this line to the code?- Try it in the shell with some test data....... how can I save each of the results to txt files?- Reading and writing files