How to print only 1 filename together with the matching pattern?

Question

I want to print the filename/s together with the matching pattern but only once even if the pattern match has multiple occurrence in the file.

E.g. I have a list of patterns; list_of_patterns.txt and the directory I need to find the files is /path/to/files/*.

list_of_patterns.txt:

A
B
C
D
E

/path/to/files/

/file1
/file2
/file3

Let say /file1 has the pattern A multiple times like this:

/file1:

(Also same goes to other files where there are multiple pattern match.)

I have this grep command running but it prints the filename every time a pattern matches.

grep -Hof list_of_patterns.txt /path/to/files/*

output:

/file1:A
/file1:A
/file1:A
/file2:B
/file2:B
/file3:C
/file3:B
... and so on.

I know sort can do this when you pipe it after the grep command grep -Hof list_of_patterns.txt /path/to/files/* | sort -u but it only executes when grep is finished. In the real world, my list_of_patterns.txt has hundreds of patterns inside. It takes sometimes an hour to finish the task.

Is there a better way to speedup the process?

UPDATE: some files have more than a hundred occurrences of matching pattern. E.g. /file4 has occurrences of pattern A 900 times. That's why it's taking grep an hour to finish because it prints every occurrences of the pattern match together with the filename.

E.g. output:

/file4:A
/file4:A
/file4:A
/file4:A
/file4:A
/file4:A
/file4:A
/file4:A
... and so on til' it reach 900 occurrences.

I only want it to print only once.

E.g. Desired output:

/file4:A
/file1:A
/file2:B
/file3:A
/file4:B

Hundreds of patterns would not make grep take an hour to process a few files. Are your files also very big or do you have many thousands of files to search in? — Kusalananda
– Kusalananda ♦, Commented Feb 14, 2018 at 6:43
@Kusalananda, Yeah I think the files are causing this issue. I just found a file that has 1 pattern match only but with 950+ occurrences. That's why it takes an hour to finish. — WashichawbachaW
– WashichawbachaW, Commented Feb 14, 2018 at 6:47
@Sundeep Would that not discard the matches for some patterns? Only the first matching pattern in the pattern file would be reported. — Kusalananda
– Kusalananda ♦, Commented Feb 14, 2018 at 6:49
@Kusalananda -m1 will cause exactly one output line per file, along with whatever pattern matched... not sure if OP wants one line for each matching pattern — Sundeep
– Sundeep, Commented Feb 14, 2018 at 6:51

RomanPerekhrest · Accepted Answer · 2018-02-14 07:27:20Z

3

Is there a better way to speedup the process?

Yes, it's called GNU parallel:

parallel -j0 -k "grep -Hof list_of_patterns.txt {} | sort -u" ::: /path/to/files/*

j N - number of jobslots. Run up to N jobs in parallel. 0 means as many as possible.
k (--keep-order) - keep sequence of output same as the order of input
::: arguments - use arguments from the command line as input source instead of stdin (standard input)

edited Feb 14, 2018 at 7:27

answered Feb 14, 2018 at 7:21

RomanPerekhrest

30.8k5 gold badges47 silver badges68 bronze badges

The -j N number should possibly be limited to a number not too much higher than the available number of cores on the machine, especially if each individual grep against a file is slow.

Kusalananda
– Kusalananda ♦

2018-02-14 07:27:01 +00:00
Commented Feb 14, 2018 at 7:27
1

What is the correct N for -j N? It depends: oletange.wordpress.com/2015/07/04/parallel-disk-io-is-it-faster

Ole Tange
– Ole Tange

2018-02-14 07:29:05 +00:00
Commented Feb 14, 2018 at 7:29
If mixing results is acceptable, remove -k + use --line-buffer and instead of sort -u: perl -ne '$s{$_}++ or print'. This will give results before the full job is finished.

Ole Tange
– Ole Tange

2018-02-14 07:30:58 +00:00
Commented Feb 14, 2018 at 7:30
Can I install it without sudo permission?

WashichawbachaW
– WashichawbachaW

2018-02-14 08:04:48 +00:00
Commented Feb 14, 2018 at 8:04
@WashichawbachaW, if you are ready for some manual "experiments" - you may try unix.stackexchange.com/questions/42567/…

RomanPerekhrest
– RomanPerekhrest

2018-02-14 08:31:45 +00:00
Commented Feb 14, 2018 at 8:31

| Show 5 more comments

Stack Exchange Network

How to print only 1 filename together with the matching pattern?

1 Answer 1

You must log in to answer this question.

Linked

Hot Network Questions

How to print only 1 filename together with the matching pattern?

1 Answer 1

You must log in to answer this question.

Linked

Related

Hot Network Questions