6

I have an XML file which contains some data as given.

<?xml version="1.0" encoding="UTF-8" ?> 
- <ParameterData>
  <CreationInfo date="10/28/2009 03:05:14 PM" user="manoj" /> 
- <ParameterList count="85">
- <Parameter name="Spec 2 Included" type="boolean" mode="both">
  <Value>n/a</Value> 
  <Result>n/a</Result> 
  </Parameter>
- <Parameter name="Spec 2 Label" type="string" mode="both">
  <Value>n/a</Value> 
  <Result>n/a</Result> 
  </Parameter>
- <Parameter name="Spec 3 Included" type="boolean" mode="both">
  <Value>n/a</Value> 
  <Result>n/a</Result> 
  </Parameter>
- <Parameter name="Spec 3 Label" type="string" mode="both">
  <Value>n/a</Value> 
  <Result>n/a</Result> 
  </Parameter>
  </ParameterList>
  </ParameterData>

I have one text file with lines as

Spec 2 Included : TRUE
Spec 2 Label: 19-Flat2-HS3   
Spec 3 Included : FALSE
Spec 3 Label: 4-1-Bead1-HS3

Now I want to edit XML texts; i,e. I want to replace the field (n/a) with the corresponding values from the text file. Like I want the file to looks like

<?xml version="1.0" encoding="UTF-8" ?> 
- <ParameterData>
  <CreationInfo date="10/28/2009 03:05:14 PM" user="manoj" /> 
- <ParameterList count="85">
- <Parameter name="Spec 2 Included" type="boolean" mode="both">
  <Value>TRUE</Value> 
  <Result>TRUE</Result> 
  </Parameter>
- <Parameter name="Spec 2 Label" type="string" mode="both">
  <Value>19-Flat2-HS3</Value> 
  <Result>19-Flat2-HS3</Result> 
  </Parameter>
- <Parameter name="Spec 3 Included" type="boolean" mode="both">
  <Value>FALSE</Value> 
  <Result>FALSE</Result> 
  </Parameter>
- <Parameter name="Spec 3 Label" type="string" mode="both">
  <Value>4-1-Bead1-HS3</Value> 
  <Result>4-1-Bead1-HS3</Result> 
  </Parameter>
  </ParameterList>
  </ParameterData>

I am new to this Python-XML coding. I dont have idea about how to edit the text fields in a XML file. I am trying to Use elementtree.ElementTree module. but to read the lines in XML file and extract the attributes I dont know which modules need to be imported.

Please help.

Thanks and Regards.

2
  • 1
    In XML jargon, the parts you want to change are called "text". "Attribute" refers to pieces like name="Spec 2 Label" or mode="both". Commented Dec 18, 2009 at 5:59
  • After spending quite a bit of time figuring out how to do it using combining the info of several of the suggestions, I wrote an improper but effective solution here: stackoverflow.com/questions/1591579/…. Perhaps it helps the people that are faced with a similar task. Commented Jan 4, 2018 at 2:43

4 Answers 4

6

You can convert your data text into python dictionary by regular expression

data="""Spec 2 Included : TRUE
Spec 2 Label: 19-Flat2-HS3
Spec 3 Included : FALSE
Spec 3 Label: 4-1-Bead1-HS3"""

#data=open("data.txt").read()

import re

data=dict(re.findall('(Spec \d+ (?:Included|Label))\s*:\s*(\S+)',data))

data will be as follows

{'Spec 3 Included': 'FALSE', 'Spec 2 Included': 'TRUE', 'Spec 3 Label': '4-1-Bead1-HS3', 'Spec 2 Label': '19-Flat2-HS3'}

Then you can convert it by using any of your favoriate xml parser, I will use minidom here.

from xml.dom import minidom

dom = minidom.parseString(xml_text)
params=dom.getElementsByTagName("Parameter")
for param in params:
    name=param.getAttribute("name")
    if name in data:
        for item in param.getElementsByTagName("*"): # You may change to "Result" or "Value" only
            item.firstChild.replaceWholeText(data[name])

print dom.toxml()

#write to file
open("output.xml","wb").write(dom.toxml())

Results

<?xml version="1.0" ?><ParameterData>
  <CreationInfo date="10/28/2009 03:05:14 PM" user="manoj"/>
  <ParameterList count="85">
    <Parameter mode="both" name="Spec 2 Included" type="boolean">
      <Value>TRUE</Value>
      <Result>TRUE</Result>
    </Parameter>
    <Parameter mode="both" name="Spec 2 Label" type="string">
      <Value>19-Flat2-HS3</Value>
      <Result>19-Flat2-HS3</Result>
    </Parameter>
    <Parameter mode="both" name="Spec 3 Included" type="boolean">
      <Value>FALSE</Value>
      <Result>FALSE</Result>
    </Parameter>
    <Parameter mode="both" name="Spec 3 Label" type="string">
      <Value>4-1-Bead1-HS3</Value>
      <Result>4-1-Bead1-HS3</Result>
    </Parameter>
  </ParameterList>
</ParameterData>
Sign up to request clarification or add additional context in comments.

7 Comments

Dear Mark, This is so helpful. Thanks a lot. I am stalked at one silly step. How can I read the Text file into a string as you have done at the beginning with name ( data=""" """). I mean I am not able to convert the text file into a dictionary. Please suggest.
Hi, to load from file use this data=open("data.txt").read(), instead of data=""" """, I have updated the my answer also.
Dear Mark, Thank you for your support and time. I am able to generate the output. How can I use the writexml() to write the output into a file. Thanks
Hi, added a line to write to file open("output.xml","wb").write(dom.toxml())
Dear Mark, Now I want to replace the Text at Result field only, leaving the Value field as it was initially in the XML file. The output needs to look like, i,e, For last Node, <Value>N/A</Value> <Result>4-1-Bead1-HS3</Result>. Is there any modification to the last statement needed, item.firstChild.replaceWholeText(data[name]) Please help. Thanks.
|
5

Well, you could start with

import xml.etree.ElementTree as ET
tree = ET.parse("blah.xml")

Find the elements you want to modify.

To replace the contents of an element, just do

element.text = "TRUE"

The import statement above works in Python 2.5 or later. If you have an older version of Python you'll need to install ElementTree as an extension, and then the import statement is different: import elementtree.ElementTree as ET.

Comments

1

Unfortunately, the XPath supported by ElementTree isn't complete. Since Python 2.6 includes an older version, finding elements by attribute (as stated here) does not work. So Python's own documentation should be your first stop: xml.etree.ElementTree

import xml.etree.ElementTree as ET

original = ET.parse("original.xml")
parameters = original.findall(".//Parameter")
changes = {}

# read changes
with open("changes.txt", "rb") as in_file:
    for change in in_file:
        change = change.rstrip()                # remove line endings
        name, value = change.split(":")
        changes[name.strip()] = value.strip()   # remove whitespaces

# find paramter element and apply changes
for parameter in parameters:
    parameter_name = parameter.get("name")
    if changes.has_key(parameter_name):                
        value = parameter.find("./Value")
        value.text = changes[parameter_name]
        result = parameter.find("./Result")
        result.text = changes[parameter_name]

original.write("new.xml")

1 Comment

Hi wierob, Thank you for your time. As I am using python 2.3 version due to some wxpython constraint with open statement may not work. So I did the necessary editing. Actually the changes dictionary is only showing one element. Also I am getting error line parameter_name is not defined. get("name") probably not working.
1

Here is how you could do it using Amara

from amara import bindery

doc = bindery.parse(XML)

def cleanup_for_dict(key, value):
    return key.strip(), value.strip()

params = dict(( cleanup_for_dict(*line.split(':', 1))
                for line in TEXT.splitlines()))

for param in doc.ParameterData.ParameterList.Parameter:
    if param.name in params:
        param.Value = params[param.name]
        param.Result = params[param.name]

doc.xml_write()

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.