Using ex with awk:
$ cat test.txt
abc def ghi 123 345 456
abc def def ghi 123 345 456
abc def def def ghi 123 345 456
$ printf '%s\n' 'g/^/.!awk -v ORS=" " -v RS=" " "/^(def|345)$/"' %p | ex test.txt
def 345
def def 345
def def def 345
$
What this does is:
- Reads the file into a buffer (in
ex) where it can be modified, printed, and/or saved;
- Filters each individual line of the buffer through an
awk script (separately);
- Prints the entire contents of the buffer (with
%p).
The above command does not save the results back into the file. If you want to do that, just replace the %p with x.
Longer explanation:
ex is the scriptable file editor. It accepts the name of a file (test.txt) as the argument, and takes editing commands from its standard input.
Here we provide the editing commands using printf. The first argument to printf is the formatting string, in this case '%s\n', which is used to control how the rest of the arguments to printf are output. We're saying that all the arguments will be strings, and a newline character should be printed after each. (The single quotes are to avoid having the shell interpret the backslash—we want printf to get the backslash, not the shell.)
There are two arguments we're sending to ex using printf. Here they are:
g/^/.!awk -v ORS=" " -v RS=" " "/^(def|345)$/"
%p
The second of these is easiest. % is an address range; it means "the entire buffer." p is the print command. So that just means "print the entire buffer."
The first one takes some breaking down.
g/.../ is the "global" command. It searches the whole buffer for lines which match the given pattern (in this case ^, a regex meaning "the start of a line") and runs the following ex editing command on each such line. Since every line has a start of line, every line matches ^, so the effect is to run the following command on every line, separately.
Then . is an address meaning "the current line (of the buffer)." Since it's given after the g command, it refers to each line of the buffer in turn.
! is used to run a shell command. When it's prefixed by an address (in this case .), the given line range (or single line) is fed to the given shell command on standard input and the result (standard output) of the command is put in place of that line of the buffer.
In other words, .!shell-command-here in ex means to filter the current line of the buffer through some external command.
So we've covered how this command setup filters each line of the buffer (individually) through an awk command; now let's analyze that awk command:
awk -v ORS=" " -v RS=" " "/^(def|345)$/"
You can define variables for awk by using the -v flag. So the first few arguments set the ORS and RS variables to a single space character.
RS in awk is the "record separator"; by default its value is a newline. Whatever character it is set to is what awk uses to separate records (usually lines) as they are read in.
Similarly, ORS, the "output record separator", controls what awk uses to separate the records (usually lines) as they are printed out.
By setting each to a space character, we can operate easily on each word of the line as a single record.
The next portion is the actual awk command. (awk is its own scripting language.) awk command blocks consist of conditions and actions; either one can be omitted. Here, the condition is /.../ which is a regex match, i.e. this condition applies to all records (words, in this case) which match the given regex. The regex parts are ^ (start of string), $ (end of string), and two possible patterns grouped in parentheses, separated with a | (pipe) to indicate that either of those patterns is acceptable.
Since there is no action after the condition (an action would be in curly braces for awk), awk's default action of "print" is applied to the records matching that condition. (Remember, this means awk will print each matching record (word) of the line, and then ex will read that output and put it in place of the line(s) of the buffer that ex fed to awk in the first place.)
This solution does make the simplifying assumption that all patterns will be matched against complete words, i.e. you will not want to match any patterns that include whitespace. This matches the example input you gave in the question.
grepman page says: "... Print only the matched (non-empty) parts of a matching line, witheach such part on a separate output line" (emphasis added); in other words, you cannot usegrep -o: you lose the information about line boundaries, and there is no way to reconstruct it.