1

I have many XML files as below where i would like to replace a string with a new string. I cannot seem to get the sed command to work on the xml files.

<form version="1.1" theme="dark">
  <label>Forcepoint DLP Dashboard - LongTerm</label>
  <description>Activity for those with Long-Term Exceptions</description>
  <fieldset submitButton="false" autoRun="false">
    <input type="time" token="TimeFrame" searchWhenChanged="true">
      <label>Timeframe</label>
      <default>
        <earliest>-48h@h</earliest>
        <latest>now</latest>
      </default>
    </input>
  </fieldset>
  <row>
    <panel>
      <html>
        <p>Macros In Use:</p>
        <p>`ForcepointApprovedUSB` = Known Approved USB Devices</p>
        <p>`ForcepointKnownCDDVD` = Known CD/DVD Drives</p>
        <p>`ForcepointKnownMultiFunction` = Known Multi-Function Devices</p>
      </html>
    </panel>
  </row>
  <row>
    <panel>
      <title>Exception Info</title>
      <table>
        <search>
          <query>index=restricted_security 
sourcetype=forcepoint 
| rex field=_raw "(.*act=(?&lt;Action&gt;.*?)\s.*)"
| rex field=_raw "(.*duser=(?&lt;Device&gt;.*?)(:\s\d|;|\sfname=).*)"
| rex field=_raw "(.*duser=.*?;\s(?&lt;Serial&gt;.*?)\sfname=)"
| rex field=_raw "(.*fname=(?&lt;Filename&gt;.*?)\smsg=.*)"
| rex field=_raw "(.*fname=.:\\\(?&lt;RawFilename&gt;.*)(?:\s-\s.*)\smsg=.*)"
| rex field=_raw "(.*suser=(?&lt;Name&gt;.*)\scat=.*)"
| rex field=_raw "(.*loginName=.*\\\\(?&lt;Username&gt;.*)\ssourceIp=.*)"
| rex field=_raw "(.*sourceIp=(?&lt;IP&gt;.*)\sseverityType=.*)"
| rex field=_raw "(.*sourceHost=(?&lt;Source&gt;.*)\sproductVersion=.*)"
| rex field=_raw "(.*sourceServiceName=(?&lt;AlertType&gt;.*)\sanalyzedBy=.*)"
| eval Username=lower(Username)
| eval Action=if(isnull(Action),"-",Action)
| eval Serial=if(isnull(Serial),"-",Serial)
| eval EnumDeviceType=case(
    (`ForcepointApprovedUSB`),"ApprovedUSB",
    (`ForcepointKnownCDDVD`),"CDDVD",
    (`ForcepointKnownMultiFunction`),"MultiFunction",
    AlertType="Endpoint Applications" AND Device="Bluetooth","Bluetooth",
    AlertType="Endpoint Removable Media" AND Device="Windows Portable Device (WPD)","WPD",
    AlertType="Endpoint Removable Media" AND 
        Device!="Windows Portable Device (WPD)" AND NOT 
        (`ForcepointApprovedUSB`) AND NOT 
        (`ForcepointKnownCDDVD`) AND NOT 
        (`ForcepointKnownMultiFunction`),"UnApprovedUSB")
| join type=inner Username
[
    search
    index=restricted_security
    sourcetype=dlp_lt
    | rename UserID as Username
    | eval Check = "Yes"
    | fields Username,Check,Justification,Type,ExpireDate
]
| where isnotnull(EnumDeviceType) AND Check="Yes"
| eval Time=strftime(_time, "%B %d, %Y %H:%M %Z")
| dedup Username
| table Time Username Name Justification Type ExpireDate
| sort Name</query>
          <earliest>$TimeFrame.earliest$</earliest>
          <latest>$TimeFrame.latest$</latest>
        </search>
        <option name="drilldown">none</option>
        <option name="refresh.display">progressbar</option>
      </table>
    </panel>
  </row>
  <row>
    <panel>
      <title>Transfers By Those With Long-Term Exceptions</title>
      <table>
        <search>
          <query>index=restricted_security 
sourcetype=forcepoint 
| rex field=_raw "(.*act=(?&lt;Action&gt;.*?)\s.*)"
| rex field=_raw "(.*duser=(?&lt;Device&gt;.*?)(:\s\d|;|\sfname=).*)"
| rex field=_raw "(.*duser=.*?;\s(?&lt;Serial&gt;.*?)\sfname=)"
| rex field=_raw "(.*fname=(?&lt;Filename&gt;.*?)\smsg=.*)"
| rex field=_raw "(.*fname=.:\\\(?&lt;RawFilename&gt;.*)(?:\s-\s.*)\smsg=.*)"
| rex field=_raw "(.*suser=(?&lt;Name&gt;.*)\scat=.*)"
| rex field=_raw "(.*loginName=.*\\\\(?&lt;Username&gt;.*)\ssourceIp=.*)"
| rex field=_raw "(.*sourceIp=(?&lt;IP&gt;.*)\sseverityType=.*)"
| rex field=_raw "(.*sourceHost=(?&lt;Source&gt;.*)\sproductVersion=.*)"
| rex field=_raw "(.*sourceServiceName=(?&lt;AlertType&gt;.*)\sanalyzedBy=.*)"
| eval Username=lower(Username)
| eval Action=if(isnull(Action),"-",Action)
| eval Serial=if(isnull(Serial),"-",Serial)
| eval EnumDeviceType=case(
    (`ForcepointApprovedUSB`),"ApprovedUSB",
    (`ForcepointKnownCDDVD`),"CDDVD",
    (`ForcepointKnownMultiFunction`),"MultiFunction",
    AlertType="Endpoint Applications" AND Device="Bluetooth","Bluetooth",
    AlertType="Endpoint Removable Media" AND Device="Windows Portable Device (WPD)","WPD",
    AlertType="Endpoint Removable Media" AND 
        Device!="Windows Portable Device (WPD)" AND NOT 
        (`ForcepointApprovedUSB`) AND NOT 
        (`ForcepointKnownCDDVD`) AND NOT 
        (`ForcepointKnownMultiFunction`),"UnApprovedUSB")
| join type=inner Username
[
    search
    index=restricted_emn_security
    sourcetype=dlp_lt
    | rename UserID as Username
    | eval Check = "Yes"
    | dedup Username
    | fields Username, Check
]
| where isnotnull(EnumDeviceType) AND Check="Yes"
| eval Time=strftime(_time, "%B %d, %Y %H:%M %Z")
| table Time Username Name Action Source Filename Device Serial EnumDeviceType
| sort -Time</query>
          <earliest>$TimeFrame.earliest$</earliest>
          <latest>$TimeFrame.latest$</latest>
        </search>
        <option name="count">30</option>
        <option name="drilldown">none</option>
      </table>
    </panel>
  </row>
</form>

The pattern i would like to replace is

index=restricted_security sourcetype=forcepoint

with

index=newname
sourcetype=forcepoint

So any pattern where

index=restricted_security
sourcetype=forcepoint

should be replaced with the new value.

The XML files have many combinations like

index=restricted_security
sourcetype=someother value, index=someindex sourcetype=forcepoint

etc but they don't need to be replaced.

I have tried many patterns like below with many combinations of sed but it does not seem to work

sed 's/index=restricted_security\s\nsourcetype=forcepoint/index=restricted_security sourcetype=forcepoint/g'

Can someone please point out how to get this to replace?

5
  • Looks a task for Python, Ruby or Perl. Commented Oct 10, 2024 at 14:56
  • Shouldn't the replacement text be /index=newname ...? If it is /index=restricted_security ... it is the same as the text you want to change. Commented Oct 10, 2024 at 15:06
  • 4
    sed (like many *nix utilities) is designed to process inputs a line at a time. sed DOES support a hold buffer and other tricks, but that is advanced usage and can be very brittle AND creates a maintenance nightmare. The GNU sed does support reading the whole file into the buffer, but then you'll need to get it installed in your production environment (assuming this is a real project) and many organizations won't allow such installations. Also processing the whole file requires superior regex skills. Learn to use python below, or as mentioned above xmlstarlet and others. Commented Oct 10, 2024 at 15:37
  • 7
    Don't attempt to process XML using non-XML-aware tools. Use XPath, XSLT, or XQuery for this kind of job (or a tool such as xmlstarlet, mentioned below, which is based on XPath). Commented Oct 10, 2024 at 18:43
  • 2
    At this point it's sort-of obligatory to post a link to wise words on the topic in another StackOverflow answer: stackoverflow.com/questions/1732348/… Commented Oct 11, 2024 at 10:22

3 Answers 3

6

Using Python's lxml:

import re
from lxml import etree

file_path = '/tmp/file.xml'

tree = etree.parse(file_path)
root = tree.getroot()

xpath_expression = '//table/search/query/text()'
text_nodes = root.xpath(xpath_expression)

if text_nodes:
    first_text_node = text_nodes[0]
    lines = first_text_node.splitlines()
    if lines and 'index=restricted_security' in lines[0]:
        lines[0] = 'index=NEW_NAME'
        updated_text = '\n'.join(lines)
        parent_element = first_text_node.getparent()
        parent_element.text = updated_text
        tree.write(file_path, pretty_print=True, xml_declaration=True, encoding='UTF-8')

The script edit the file in place.

6

Using xmlstarlet as shell commands, in 2 calls of this utility:

#!/bin/sh
 
xmlstarlet sel -t -v '//table/search/query/text()' file.xml > /tmp/temp.txt
grep 'index=restricted_security' /tmp/temp.txt || exit 0
xmlstarlet ed -u '//table/search/query' -v "index=NEW_NAME
$(awk 'NR>1' /tmp/temp.txt)" file.xml

You can add the -L switch to xmlstarlet ed if you need to edit in place.

You can even edit the /tmp/temp.txt file with sed if needed:

(this is not XML but text after the first execution of xmlstarlet)

#!/bin/sh
 
xmlstarlet sel -t -v '//table/search/query/text()' file.xml > /tmp/temp.txt
sed -i 's/index=restricted_security/index=NEW_NAME/' /tmp/temp.txt
xmlstarlet ed -u '//table/search/query' -v "$(</tmp/temp.txt)" file.xml
4

Using GNU sed for -z, -E, \s shorthand for space, and word boundaries \< and \>:

$ sed -Ez 's/\<(index=)restricted_security(\s+sourcetype=forcepoint)\>/\1newname\2/g' file > o1

$ diff file o1
28c28
<           <query>index=restricted_security
---
>           <query>index=newname
81c81
<           <query>index=restricted_security
---
>           <query>index=newname

or if you wanted the 2 strings concatenated onto a single line (it's not clear from your question):

$ sed -Ez 's/\<(index=)restricted_security\s+(sourcetype=forcepoint)\>/\1newname \2/g' file > o1

$ diff file o1
28,29c28
<           <query>index=restricted_security
< sourcetype=forcepoint
---
>           <query>index=newname sourcetype=forcepoint
81,82c80
<           <query>index=restricted_security
< sourcetype=forcepoint
---
>           <query>index=newname sourcetype=forcepoint

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.