To extract every word (not contains comma) which is between the two colons
grep -oHnE ":[^,]*:" files* | awk 'BEGIN{FS=":"} {x=$1$2; a[x]=a[x]","$4} END{for(x in a) print a[x]}' | sed 's/^,//'
Assuming we have file car_info.txt
cat car_info.txt
USER1 Info> :FERRARI:RED,:LAMBORGHINI:ORANGE,:MASERATI:BLUE
USER1 Info> :FERRARI:RED,:LAMBORGHINI:ORANGE
USER1 Info> :FERRARI:RED,:LAMBORGHINI:ORANGE,:MASERATI:BLUE
grep -oHnE ":[^,]*:" car_info.txt
car_info.txt:1::FERRARI:
car_info.txt:1::LAMBORGHINI:
car_info.txt:1::MASERATI:
car_info.txt:2::FERRARI:
car_info.txt:2::LAMBORGHINI:
car_info.txt:3::FERRARI:
car_info.txt:3::LAMBORGHINI:
car_info.txt:3::MASERATI:
grep
-o prints only the matched parts of a matching line
-H prints filenames
-n prints line numbers
-E to support extended-regex
Now the strategy is to make the lines with same filename:line to be shown in one line
awk 'BEGIN{FS=":"} {x=$1$2;a[x]=a[x]","$4} END{for(x in a) print a[x]}'
BEGIN{FS=":"} setting the field separator to :
x=$1$2 put filename and line in x
- create a key-value
a with key=x and adding the 4th field to it's value in each line with same x
for(x in a) print a[x] print the values of the key-value a
sed 's/^,//' is for removing , from the begining of lines
AA,andBB,are also "words which is between the two colons". What specific criteria distinguish the parts you want to retain from these?