Skip to main content
Fixed to handle `{fileX`.
Source Link

I couldn’t immediately figure out how to do it in sed without assuming that there is at least one character that is known never to appear in the input.  I assumed that # will never appear in the input (or in your added word).  This seems to work:

sed '/read build/ {
        s/{/{ /
        : fruit
        s/\({.*\) \([^}# ][^ ]*\)/\1#MY_WORD\2/
        t fruit
        s/#/ /g
        s/{ /{/
}'

On lines that contain read build it first inserts a space after the {. Then it searches for a space that is somewhere after a { and immediately before a word (presumably a filename).  It replaces the space with #, inserts your word, and goes back and looks for more.  (fruit is an arbitrary loop label.)  Once it’s found them all, it turns all the # characters back to spaces, and removes the space that it inserted (after the {).

  • there is a space after the {.
  • } is the last non-blank character on each read build line, and
  • whitespace is spaces only; no tabs.
awk '/read build/ {
        in_braces=0
        for (i = 1; i <= NF; i++) {
                if ($i == "{") in_braces=1
                else if (substr($i,1,1) == "{") {
                        $i = "{MYWORD" substr($i,2)
                        in_braces=1
                }
                else if ($i == "}") in_braces=0
                else if (in_braces) $i = "MY_WORD" $i
        }
      }
      { print }'

For each read build line, it loops through all the words (fields) in the line.  It uses a state variable (in_braces) to keep track of whether it is between a { and a }; if it is, it modifies each word to begin with your added word.  WhileNote that it has to handle two slightly different cases:

  • if a word is {, set the flag to start modifying all subsequent words, and
  • if a word begins with {, it is actually a compound of the form {fileX, so modify it to be the concatenation of {, the added word, and the fileX filename.  And also set the flag to modify all subsequent words.

While this allows tabs as word separators, it has the weakness that it collapses white space to a single space.  So for example, the input

  • the { is at the beginning of a separate word (i.e., it has whitespace before and afterit), and and
  • either } is the last non-blank character on each read build line, or it is a separate word (i.e., it has whitespace before and after)

I couldn’t immediately figure out how to do it in sed without assuming that there is at least one character that is known never to appear in the input.  I assumed that # will never appear in the input.  This seems to work:

sed '/read build/ {
        : fruit
        s/\({.*\) \([^}# ][^ ]*\)/\1#MY_WORD\2/
        t fruit
        s/#/ /g
}'

On lines that contain read build it searches for a space that is somewhere after a { and immediately before a word (presumably a filename).  It replaces the space with #, inserts your word, and goes back and looks for more.  (fruit is an arbitrary loop label.)  Once it’s found them all, it turns all the # characters back to spaces.

  • there is a space after the {.
  • } is the last non-blank character on each read build line, and
  • whitespace is spaces only; no tabs.
awk '/read build/ {
        in_braces=0
        for (i = 1; i <= NF; i++) {
                if ($i == "{") in_braces=1
                else if ($i == "}") in_braces=0
                else if (in_braces) $i = "MY_WORD" $i
        }
      }
      { print }'

For each read build line, it loops through all the words (fields) in the line.  It uses a state variable (in_braces) to keep track of whether it is between a { and a }; if it is, it modifies each word to begin with your added word.  While this allows tabs as word separators, it has the weakness that it collapses white space to a single space.  So for example, the input

  • the { is a separate word (i.e., it has whitespace before and after), and
  • either } is the last non-blank character on each read build line, or it is a separate word (i.e., it has whitespace before and after)

I couldn’t immediately figure out how to do it in sed without assuming that there is at least one character that is known never to appear in the input.  I assumed that # will never appear in the input (or in your added word).  This seems to work:

sed '/read build/ {
        s/{/{ /
        : fruit
        s/\({.*\) \([^}# ][^ ]*\)/\1#MY_WORD\2/
        t fruit
        s/#/ /g
        s/{ /{/
}'

On lines that contain read build it first inserts a space after the {. Then it searches for a space that is somewhere after a { and immediately before a word (presumably a filename).  It replaces the space with #, inserts your word, and goes back and looks for more.  (fruit is an arbitrary loop label.)  Once it’s found them all, it turns all the # characters back to spaces, and removes the space that it inserted (after the {).

  • } is the last non-blank character on each read build line, and
  • whitespace is spaces only; no tabs.
awk '/read build/ {
        in_braces=0
        for (i = 1; i <= NF; i++) {
                if ($i == "{") in_braces=1
                else if (substr($i,1,1) == "{") {
                        $i = "{MYWORD" substr($i,2)
                        in_braces=1
                }
                else if ($i == "}") in_braces=0
                else if (in_braces) $i = "MY_WORD" $i
        }
      }
      { print }'

For each read build line, it loops through all the words (fields) in the line.  It uses a state variable (in_braces) to keep track of whether it is between a { and a }; if it is, it modifies each word to begin with your added word.  Note that it has to handle two slightly different cases:

  • if a word is {, set the flag to start modifying all subsequent words, and
  • if a word begins with {, it is actually a compound of the form {fileX, so modify it to be the concatenation of {, the added word, and the fileX filename.  And also set the flag to modify all subsequent words.

While this allows tabs as word separators, it has the weakness that it collapses white space to a single space.  So for example, the input

  • the { is at the beginning of a word (i.e., it has whitespace before it), and
  • either } is the last non-blank character on each read build line, or it is a separate word (i.e., it has whitespace before and after)
Source Link

I couldn’t immediately figure out how to do it in sed without assuming that there is at least one character that is known never to appear in the input.  I assumed that # will never appear in the input.  This seems to work:

sed '/read build/ {
        : fruit
        s/\({.*\) \([^}# ][^ ]*\)/\1#MY_WORD\2/
        t fruit
        s/#/ /g
}'

On lines that contain read build it searches for a space that is somewhere after a { and immediately before a word (presumably a filename).  It replaces the space with #, inserts your word, and goes back and looks for more.  (fruit is an arbitrary loop label.)  Once it’s found them all, it turns all the # characters back to spaces.

In addition to the bit about # not occurring in the input, this assumes that

  • there is a space after the {.
  • } is the last non-blank character on each read build line, and
  • whitespace is spaces only; no tabs.

In awk:

awk '/read build/ {
        in_braces=0
        for (i = 1; i <= NF; i++) {
                if ($i == "{") in_braces=1
                else if ($i == "}") in_braces=0
                else if (in_braces) $i = "MY_WORD" $i
        }
      }
      { print }'

For each read build line, it loops through all the words (fields) in the line.  It uses a state variable (in_braces) to keep track of whether it is between a { and a }; if it is, it modifies each word to begin with your added word.  While this allows tabs as word separators, it has the weakness that it collapses white space to a single space.  So for example, the input

read build    {    file1    file2    file3    }

would produce the output

read build { MY_WORDfile1 MY_WORDfile2 MY_WORDfile3 }

Further, this assumes that

  • the { is a separate word (i.e., it has whitespace before and after), and
  • either } is the last non-blank character on each read build line, or it is a separate word (i.e., it has whitespace before and after)

It allows multiple sets of braces; e.g.,

read build { file1 file2 file3 } text to be left alone { file4 file5 file6 }