Return to Answer

Fixed to handle `{fileX`.

Source Link

edited Nov 11, 2015 at 21:24

I couldn’t immediately figure out how to do it in sed without assuming that there is at least one character that is known never to appear in the input. I assumed that # will never appear in the input (or in your added word). This seems to work:

sed '/read build/ {
        s/{/{ /
        : fruit
        s/\({.*\) \([^}# ][^ ]*\)/\1#MY_WORD\2/
        t fruit
        s/#/ /g
        s/{ /{/
}'

On lines that contain read build it first inserts a space after the {. Then it searches for a space that is somewhere after a { and immediately before a word (presumably a filename). It replaces the space with #, inserts your word, and goes back and looks for more. (fruit is an arbitrary loop label.) Once it’s found them all, it turns all the # characters back to spaces, and removes the space that it inserted (after the {).

there is a space after the {.
} is the last non-blank character on each read build line, and
whitespace is spaces only; no tabs.

awk '/read build/ {
        in_braces=0
        for (i = 1; i <= NF; i++) {
                if ($i == "{") in_braces=1
                else if (substr($i,1,1) == "{") {
                        $i = "{MYWORD" substr($i,2)
                        in_braces=1
                }
                else if ($i == "}") in_braces=0
                else if (in_braces) $i = "MY_WORD" $i
        }
      }
      { print }'

For each read build line, it loops through all the words (fields) in the line. It uses a state variable (in_braces) to keep track of whether it is between a { and a }; if it is, it modifies each word to begin with your added word. WhileNote that it has to handle two slightly different cases:

if a word is {, set the flag to start modifying all subsequent words, and

if a word begins with {, it is actually a compound of the form {fileX, so modify it to be the concatenation of {, the added word, and the fileX filename. And also set the flag to modify all subsequent words.

While this allows tabs as word separators, it has the weakness that it collapses white space to a single space. So for example, the input

the { is at the beginning of a separate word (i.e., it has whitespace before and afterit), and and
either } is the last non-blank character on each read build line, or it is a separate word (i.e., it has whitespace before and after)

sed '/read build/ {
        : fruit
        s/\({.*\) \([^}# ][^ ]*\)/\1#MY_WORD\2/
        t fruit
        s/#/ /g
}'

On lines that contain read build it searches for a space that is somewhere after a { and immediately before a word (presumably a filename). It replaces the space with #, inserts your word, and goes back and looks for more. (fruit is an arbitrary loop label.) Once it’s found them all, it turns all the # characters back to spaces.

there is a space after the {.
} is the last non-blank character on each read build line, and
whitespace is spaces only; no tabs.

awk '/read build/ {
        in_braces=0
        for (i = 1; i <= NF; i++) {
                if ($i == "{") in_braces=1
                else if ($i == "}") in_braces=0
                else if (in_braces) $i = "MY_WORD" $i
        }
      }
      { print }'

For each read build line, it loops through all the words (fields) in the line. It uses a state variable (in_braces) to keep track of whether it is between a { and a }; if it is, it modifies each word to begin with your added word. While this allows tabs as word separators, it has the weakness that it collapses white space to a single space. So for example, the input

the { is a separate word (i.e., it has whitespace before and after), and
either } is the last non-blank character on each read build line, or it is a separate word (i.e., it has whitespace before and after)

sed '/read build/ {
        s/{/{ /
        : fruit
        s/\({.*\) \([^}# ][^ ]*\)/\1#MY_WORD\2/
        t fruit
        s/#/ /g
        s/{ /{/
}'

} is the last non-blank character on each read build line, and
whitespace is spaces only; no tabs.

awk '/read build/ {
        in_braces=0
        for (i = 1; i <= NF; i++) {
                if ($i == "{") in_braces=1
                else if (substr($i,1,1) == "{") {
                        $i = "{MYWORD" substr($i,2)
                        in_braces=1
                }
                else if ($i == "}") in_braces=0
                else if (in_braces) $i = "MY_WORD" $i
        }
      }
      { print }'

For each read build line, it loops through all the words (fields) in the line. It uses a state variable (in_braces) to keep track of whether it is between a { and a }; if it is, it modifies each word to begin with your added word. Note that it has to handle two slightly different cases:

if a word is {, set the flag to start modifying all subsequent words, and

if a word begins with {, it is actually a compound of the form {fileX, so modify it to be the concatenation of {, the added word, and the fileX filename. And also set the flag to modify all subsequent words.

While this allows tabs as word separators, it has the weakness that it collapses white space to a single space. So for example, the input

the { is at the beginning of a word (i.e., it has whitespace before it), and
either } is the last non-blank character on each read build line, or it is a separate word (i.e., it has whitespace before and after)

Source Link

answered Nov 11, 2015 at 20:24

G-Man Says 'Reinstate Monica'

sed '/read build/ {
        : fruit
        s/\({.*\) \([^}# ][^ ]*\)/\1#MY_WORD\2/
        t fruit
        s/#/ /g
}'

In addition to the bit about # not occurring in the input, this assumes that

there is a space after the {.
} is the last non-blank character on each read build line, and
whitespace is spaces only; no tabs.

In awk:

awk '/read build/ {
        in_braces=0
        for (i = 1; i <= NF; i++) {
                if ($i == "{") in_braces=1
                else if ($i == "}") in_braces=0
                else if (in_braces) $i = "MY_WORD" $i
        }
      }
      { print }'

For each read build line, it loops through all the words (fields) in the line. It uses a state variable (in_braces) to keep track of whether it is between a { and a }; if it is, it modifies each word to begin with your added word. While this allows tabs as word separators, it has the weakness that it collapses white space to a single space. So for example, the input

read build    {    file1    file2    file3    }

would produce the output

read build { MY_WORDfile1 MY_WORDfile2 MY_WORDfile3 }

Further, this assumes that

the { is a separate word (i.e., it has whitespace before and after), and
either } is the last non-blank character on each read build line, or it is a separate word (i.e., it has whitespace before and after)

It allows multiple sets of braces; e.g.,

read build { file1 file2 file3 } text to be left alone { file4 file5 file6 }