Skip to main content
Tweeted twitter.com/StackUnix/status/919531571971059712
edited tags
Link
Jeff Schaller
  • 68.8k
  • 35
  • 122
  • 264
edited tags
Link
Jeff Schaller
  • 68.8k
  • 35
  • 122
  • 264
Source Link
Bob Lyman
  • 53
  • 1
  • 1
  • 5

Search replace in XML file with sed or awk

So I have a task where by I have to manipulate an XML file through a bash shell script.

Here are the steps:

  1. Query XML file for a value.
  2. Take the value and cross reference it to find a new value from a list.
  3. Replace the value of a different element with the new value.

Here is a sample of the XML with non-essential info removed:

<fmreq:fileManagementRequestDetail xmlns:fmreq="http://foobar.com/filemanagement">
      <fmreq:property>
         <fmreq:name>form_category_cd</fmreq:name>
         <fmreq:value>Memos</fmreq:value>
      </fmreq:property>
      <fmreq:property>
         <fmreq:name>object_name</fmreq:name>
         <fmreq:value>Correspondence</fmreq:value>
      </fmreq:property>
</fmreq:fileManagementRequestDetail>

I have to get the value from the value element under object_name, cross reference it, and then replace the value under the form_category_cd value element with the new value:

So if object_name -> value is Correspondence then the form_category_cd -> value might need to be YYZ.

Here's the rub, I can only use the tools available on our server as our operations group is restricting us to the tools at hand. It was a fight to get xmllint updated and then it got overruled. I'm on a version that does not support --xpath, which believe me is difficult on a good day. Also the version I have available doesn't support namespaces, so xmllint is out.

I've tried sed, but it seems to not like my regex even though every tester I try works fine.

Regex:

(<fmreq\:name>object_name<\/fmreq\:name>)(?:\n\s*)(<fmreq\:value>)(.*)(<\/fmreq\:value>)

I need to get group #3, but sed won't return it. Instead it returns the entire contents of the XML file.

sed -e 's/\(<fmreq\:name>object_name<\/fmreq\:name>\)\(?:\n\s*\)\(<fmreq\:value>\)\(.*\)\(<\/fmreq\:value>\)/\3/' < c3.xml 

I'm not as familiar with awk / gawk, so I'm struggling to figure them out and this as well, but am open to them if a solution can be found.

Would love to have an awk / gawk solution just to make the boss happy since he's an old awk fan, but I'll take what I can get as I'm stumped.

Again I have to use the tools on hand and can't install anything new.