Skip to main content
3 of 3
added 1120 characters in body
glenn jackman
  • 88.5k
  • 16
  • 124
  • 179

If there's only one occurrence of <ex>...</ex> per line:

sed -e :1 -e 's@\(<ex>.*\)&\(.*</ex>\)@\1#\2@;t1'

If there may be several occurrences and they don't nest (or they nest and you want to replace the & only in the deepest occurrences):

sed '
  s|_|_u|g        # replace all underscores with "_u"
  s|(|_o|g        # replace all open parentheses with "_o"
  s|)|_c|g        # replace all close parentheses with "_c"
  s|<ex>|(|g      # replace all open ex tags with "("
  s|</ex>|)|g     # replace all close ex tags with ")"

  :1              # a label

  s/\(([^()]*\)&\([^()]*)\)/\1#\2/g
                  # find:
                  #   an open parentheses, 
                  #   some non-parentheses chars (captured),
                  #   an ampersand, 
                  #   some non-parentheses chars (captured) and 
                  #   a close parentheses, 
                  # replace with
                  #   the first captured text, 
                  #   an octothorpe
                  #   the second captured text, 
                  # globally in the current record.

  t1              # if there was a successful replacement, goto label "1",
                  # else carry on

  s|(|<ex>|g      # restore open tags
  s|)|</ex>|g     # restore close tags
  s|_o|(|g        # restore open parentheses
  s|_c|)|g        # restore close parentheses
  s|_u|_|g        # restore underscores
'

If they may nest and you want to replace in the enclosing ones:

sed '
  s|_|_u|g;s|(|_o|g;s|)|_c|g
  s|<ex>|(|g;s|</ex>|)|g;:1
  s/\(([^()]*\)(\([^()]*\))\([^()]*)\)/\1_O\2_C\3/g;t1
  :2
  s/\(([^()]*\)&\([^()]*)\)/\1#\2/g;t2
  s|(|<ex>|g;s|)|</ex>|g
  s|_O|<ex>|g;s|_C|</ex>|g
  s|_o|(|g;s|_c|)|g;s|_u|_|g'
Stéphane Chazelas
  • 585.1k
  • 96
  • 1.1k
  • 1.7k