Multiple lines multiple string to one line

Question

Input (multiple lines):

abc def ghi 123 345 456 
abc def def ghi 123 345 456
abc def def def ghi 123 345 456

Output (extract string/regex from one line to one line):

def 345
def def 345
def def def 345

First..

echo "abc 123" | grep -Po "\Kabc|\K123"

but this prints two lines:

abc
123

Second:

echo -ne "abc def bac 123\nabc def def bac 123\nabc def def def bac 123 123\n" | grep -Po "def|123" | paste -d ' ' - -

But this shows:

def 123
def def
123 def
def def
123 123

I want:

def 123
def def 123
def def def 123 123

I can't use tr to remove \n, def or 345 can be found multiple times in one line, then removing every second line \n have no sense. I can't use a column separator.

Using regex or specified word. I added some more scription in post, — Mateusz Adam Katana
– Mateusz Adam Katana, Commented Sep 23, 2019 at 21:57
The grep man page says: "... Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line" (emphasis added); in other words, you cannot use grep -o: you lose the information about line boundaries, and there is no way to reconstruct it. — NickD
– NickD, Commented Sep 23, 2019 at 22:56

Sundeep · Accepted Answer · 2019-09-24 05:32:11Z

With perl

$ cat ip.txt
abc def ghi 123 345 456 
abc def def ghi 123 345 456
abc def def def ghi 123 345 456 1234

$ perl -lane 'print join " ", grep { /def|123/ } @F' ip.txt
def 123
def def 123
def def def 123 1234

$ perl -lane 'print join " ", grep { $_ eq "def" || $_ eq "123" } @F' ip.txt
def 123
def def 123
def def def 123

-lane here -l will strip newline from input line and add it back when print is used, -a will autosplit input line on whitespaces and save results in @F array, -n will loop over input lines but won't automatically print the lines after processing and -e enables to provide Perl script from command line
grep { /def|123/ } @F will filter all elements of @F array if they contain def or 123
- if you want string match instead of regex, you can use grep { $_ eq "def" || $_ eq "123" } @F
print join " " print the elements obtained from grep output with space as delimiter

Wildcard · Accepted Answer · 2019-09-24 07:46:18Z

Using ex with awk:

$ cat test.txt
abc def ghi 123 345 456 
abc def def ghi 123 345 456
abc def def def ghi 123 345 456
$ printf '%s\n' 'g/^/.!awk -v ORS=" " -v RS=" " "/^(def|345)$/"' %p | ex test.txt
def 345 
def def 345 
def def def 345 
$

What this does is:

Reads the file into a buffer (in ex) where it can be modified, printed, and/or saved;
Filters each individual line of the buffer through an awk script (separately);
Prints the entire contents of the buffer (with %p).

The above command does not save the results back into the file. If you want to do that, just replace the %p with x.

Longer explanation:

ex is the scriptable file editor. It accepts the name of a file (test.txt) as the argument, and takes editing commands from its standard input.

Here we provide the editing commands using printf. The first argument to printf is the formatting string, in this case '%s\n', which is used to control how the rest of the arguments to printf are output. We're saying that all the arguments will be strings, and a newline character should be printed after each. (The single quotes are to avoid having the shell interpret the backslash—we want printf to get the backslash, not the shell.)

There are two arguments we're sending to ex using printf. Here they are:

g/^/.!awk -v ORS=" " -v RS=" " "/^(def|345)$/"
%p

The second of these is easiest. % is an address range; it means "the entire buffer." p is the print command. So that just means "print the entire buffer."

The first one takes some breaking down.

g/.../ is the "global" command. It searches the whole buffer for lines which match the given pattern (in this case ^, a regex meaning "the start of a line") and runs the following ex editing command on each such line. Since every line has a start of line, every line matches ^, so the effect is to run the following command on every line, separately.

Then . is an address meaning "the current line (of the buffer)." Since it's given after the g command, it refers to each line of the buffer in turn.

! is used to run a shell command. When it's prefixed by an address (in this case .), the given line range (or single line) is fed to the given shell command on standard input and the result (standard output) of the command is put in place of that line of the buffer.

In other words, .!shell-command-here in ex means to filter the current line of the buffer through some external command.

So we've covered how this command setup filters each line of the buffer (individually) through an awk command; now let's analyze that awk command:

awk -v ORS=" " -v RS=" " "/^(def|345)$/"

You can define variables for awk by using the -v flag. So the first few arguments set the ORS and RS variables to a single space character.

RS in awk is the "record separator"; by default its value is a newline. Whatever character it is set to is what awk uses to separate records (usually lines) as they are read in.

Similarly, ORS, the "output record separator", controls what awk uses to separate the records (usually lines) as they are printed out.

By setting each to a space character, we can operate easily on each word of the line as a single record.

The next portion is the actual awk command. (awk is its own scripting language.) awk command blocks consist of conditions and actions; either one can be omitted. Here, the condition is /.../ which is a regex match, i.e. this condition applies to all records (words, in this case) which match the given regex. The regex parts are ^ (start of string), $ (end of string), and two possible patterns grouped in parentheses, separated with a | (pipe) to indicate that either of those patterns is acceptable.

Since there is no action after the condition (an action would be in curly braces for awk), awk's default action of "print" is applied to the records matching that condition. (Remember, this means awk will print each matching record (word) of the line, and then ex will read that output and put it in place of the line(s) of the buffer that ex fed to awk in the first place.)

This solution does make the simplifying assumption that all patterns will be matched against complete words, i.e. you will not want to match any patterns that include whitespace. This matches the example input you gave in the question.

Freddy · Accepted Answer · 2019-09-23 23:25:30Z

0

You could use awk and only keep the fields you want:

echo -e "abc def bac 123\nabc def def bac 123\nabc def def def bac 123 123" \
  | awk -v var1="def" -v var2="123" '{
  i=0
  for (j=1; j<=NF; j++){
    if ($j==var1 || $j==var2){ $++i=$j }
    if (i!=j){ $j="" }
  }
  print
}'

This loops through the fields in the for-loop and reassigns def or 123 to the next field $++i=$j (starting at index 0, so the first field is 1, the next is 2...) and resets current field $j to an empty string ($j="") if index i is not the loop index j.

Output:

def 123
def def 123
def def def 123 123

edited Sep 23, 2019 at 23:25

answered Sep 23, 2019 at 22:59

Freddy

26.3k1 gold badge27 silver badges64 bronze badges

Add a comment |

Stack Exchange Network

Multiple lines multiple string to one line

3 Answers 3

You must log in to answer this question.

Hot Network Questions

Multiple lines multiple string to one line

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions