How can I extract specific part of a file?

Question

I have multiple files containing several different lines. Among the lines, I am interested only in the ones starting with a specific pattern, such as:

USER1    Info> :FERRARI:RED,:LAMBORGHINI:ORANGE,:MASERATI:BLUE,...

In this example (with spaces as indicated).

From these lines I only need the car's make info (without the colours) so the output should look like this:

FERRARI, LAMBORGHINI and MASERATI

The car's make are always different in each file. In the example I put 3 car's make, but in each file they could be less or more. Is there an easy way to do this in bash or python? Thanks

Please have a look at stackoverflow.com/a/36211753/5217119 Hints: you need to extract columns $2,$4,$6 — binarysta
– binarysta, Commented May 29, 2020 at 0:08
Thanks for the link. Actually, columns may be more than 3. I put 3 in the question only as an example, but depending on the file, they could be less or more, and I can't know how many of them are in each file before. — ginopino
– ginopino, Commented May 29, 2020 at 0:14
Edited. My apologize for not being clear enough. I hope now it is clearer. — ginopino
– ginopino, Commented May 29, 2020 at 0:23
Note that AA, and BB, are also "words which is between the two colons". What specific criteria distinguish the parts you want to retain from these? — steeldriver
– steeldriver, Commented May 29, 2020 at 0:26
the comma is the separator. I will try to edit the question with a more specific example — ginopino
– ginopino, Commented May 29, 2020 at 0:57

binarysta · Accepted Answer · 2020-05-29 19:30:18Z

0

To extract every word (not contains comma) which is between the two colons

grep -oHnE ":[^,]*:" files* | awk 'BEGIN{FS=":"} {x=$1$2; a[x]=a[x]","$4} END{for(x in a) print a[x]}' | sed 's/^,//'

Assuming we have file car_info.txt

cat car_info.txt
USER1    Info> :FERRARI:RED,:LAMBORGHINI:ORANGE,:MASERATI:BLUE
USER1    Info> :FERRARI:RED,:LAMBORGHINI:ORANGE
USER1    Info> :FERRARI:RED,:LAMBORGHINI:ORANGE,:MASERATI:BLUE


grep -oHnE ":[^,]*:" car_info.txt 
car_info.txt:1::FERRARI:
car_info.txt:1::LAMBORGHINI:
car_info.txt:1::MASERATI:
car_info.txt:2::FERRARI:
car_info.txt:2::LAMBORGHINI:
car_info.txt:3::FERRARI:
car_info.txt:3::LAMBORGHINI:
car_info.txt:3::MASERATI:

grep

-o prints only the matched parts of a matching line
-H prints filenames
-n prints line numbers
-E to support extended-regex

Now the strategy is to make the lines with same filename:line to be shown in one line

awk 'BEGIN{FS=":"} {x=$1$2;a[x]=a[x]","$4} END{for(x in a) print a[x]}'

BEGIN{FS=":"} setting the field separator to :
x=$1$2 put filename and line in x
create a key-value a with key=x and adding the 4th field to it's value in each line with same x
for(x in a) print a[x] print the values of the key-value a

sed 's/^,//' is for removing , from the begining of lines

edited May 29, 2020 at 19:30

answered May 29, 2020 at 0:52

binarysta

3,33715 silver badges15 bronze badges

Thanks for your answer. I tried this, but it won't recognize the first part of the line: *** Unbound variable: USER (You're accessing an undefined variable or function `USER')

ginopino
– ginopino

2020-05-29 01:19:21 +00:00
Commented May 29, 2020 at 1:19
Thank you again for your help. Actually, I tried it now but the only thing I got are the commas. I copied your script as it is and changed "files*" with the file name. I am really new to this things, so I apologize if I am doing some very basic error.

ginopino
– ginopino

2020-05-29 18:06:31 +00:00
Commented May 29, 2020 at 18:06
the file I am using to try your script is car_info.txt

ginopino
– ginopino

2020-05-29 18:30:39 +00:00
Commented May 29, 2020 at 18:30
well.. this quite did the job. The output is a list of the words in the file between the two colons. Now the only thing I need to do is to select only the lines starting with "USER1 Info> "

ginopino
– ginopino

2020-05-29 18:40:37 +00:00
Commented May 29, 2020 at 18:40
@ginopino I found the problem, please have a look at the answer

binarysta
– binarysta

2020-05-29 18:52:06 +00:00
Commented May 29, 2020 at 18:52

| Show 2 more comments

Yunus · Accepted Answer · 2020-05-29 00:08:40Z

0

awk -F':' /^USER1.\*Info/' {print $2" "$4" "$6}' < infile

answered May 29, 2020 at 0:08

Yunus

1,7422 gold badges14 silver badges19 bronze badges

Add a comment |

Stack Exchange Network

How can I extract specific part of a file?

2 Answers 2

You must log in to answer this question.

Hot Network Questions

How can I extract specific part of a file?

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions