Skip to main content
added relative tags
Link
RomanPerekhrest
  • 30.9k
  • 5
  • 47
  • 68
added 489 characters in body
Source Link

I want to print the filename/s together with the matching pattern but only once even if the pattern match has multiple occurrence in the file.

E.g. I have a list of patterns; list_of_patterns.txt and the directory I need to find the files is /path/to/files/*.

list_of_patterns.txt:

A
B
C
D
E

/path/to/files/

/file1
/file2
/file3

Let say /file1 has the pattern A multiple times like this:

/file1:

A
4234234
A
435435435
353535
A

(Also same goes to other files where there are multiple pattern match.)

I have this grep command running but it prints the filename every time a pattern matches.

grep -Hof list_of_patterns.txt /path/to/files/*

output:

/file1:A
/file1:A
/file1:A
/file2:B
/file2:B
/file3:C
/file3:B
... and so on.

I know sort can do this when you pipe it after the grep command grep -Hof list_of_patterns.txt /path/to/files/* | sort -u but it only executes when grep is finished. In the real world, my list_of_patterns.txt has hundreds of patterns inside. It takes sometimes an hour to finish the task.

Is there a better way to speedup the process?

UPDATE: some files have more than a hundred occurrences of matching pattern. E.g. /file4 has occurrences of pattern A 900 times. That's why it's taking grep an hour to finish because it prints every occurrences of the pattern match together with the filename.

E.g. output:

/file4:A
/file4:A
/file4:A
/file4:A
/file4:A
/file4:A
/file4:A
/file4:A
... and so on til' it reach 900 occurrences.

I only want it to print only once.

E.g. Desired output:

/file4:A
/file1:A
/file2:B
/file3:A
/file4:B

I want to print the filename/s together with the matching pattern but only once even if the pattern match has multiple occurrence in the file.

E.g. I have a list of patterns; list_of_patterns.txt and the directory I need to find the files is /path/to/files/*.

list_of_patterns.txt:

A
B
C
D
E

/path/to/files/

/file1
/file2
/file3

Let say /file1 has the pattern A multiple times like this:

/file1:

A
4234234
A
435435435
353535
A

(Also same goes to other files where there are multiple pattern match.)

I have this grep command running but it prints the filename every time a pattern matches.

grep -Hof list_of_patterns.txt /path/to/files/*

output:

/file1:A
/file1:A
/file1:A
/file2:B
/file2:B
/file3:C
/file3:B
... and so on.

I know sort can do this when you pipe it after the grep command grep -Hof list_of_patterns.txt /path/to/files/* | sort -u but it only executes when grep is finished. In the real world, my list_of_patterns.txt has hundreds of patterns inside. It takes sometimes an hour to finish the task.

Is there a better way to speedup the process?

I want to print the filename/s together with the matching pattern but only once even if the pattern match has multiple occurrence in the file.

E.g. I have a list of patterns; list_of_patterns.txt and the directory I need to find the files is /path/to/files/*.

list_of_patterns.txt:

A
B
C
D
E

/path/to/files/

/file1
/file2
/file3

Let say /file1 has the pattern A multiple times like this:

/file1:

A
4234234
A
435435435
353535
A

(Also same goes to other files where there are multiple pattern match.)

I have this grep command running but it prints the filename every time a pattern matches.

grep -Hof list_of_patterns.txt /path/to/files/*

output:

/file1:A
/file1:A
/file1:A
/file2:B
/file2:B
/file3:C
/file3:B
... and so on.

I know sort can do this when you pipe it after the grep command grep -Hof list_of_patterns.txt /path/to/files/* | sort -u but it only executes when grep is finished. In the real world, my list_of_patterns.txt has hundreds of patterns inside. It takes sometimes an hour to finish the task.

Is there a better way to speedup the process?

UPDATE: some files have more than a hundred occurrences of matching pattern. E.g. /file4 has occurrences of pattern A 900 times. That's why it's taking grep an hour to finish because it prints every occurrences of the pattern match together with the filename.

E.g. output:

/file4:A
/file4:A
/file4:A
/file4:A
/file4:A
/file4:A
/file4:A
/file4:A
... and so on til' it reach 900 occurrences.

I only want it to print only once.

E.g. Desired output:

/file4:A
/file1:A
/file2:B
/file3:A
/file4:B
Source Link

How to print only 1 filename together with the matching pattern?

I want to print the filename/s together with the matching pattern but only once even if the pattern match has multiple occurrence in the file.

E.g. I have a list of patterns; list_of_patterns.txt and the directory I need to find the files is /path/to/files/*.

list_of_patterns.txt:

A
B
C
D
E

/path/to/files/

/file1
/file2
/file3

Let say /file1 has the pattern A multiple times like this:

/file1:

A
4234234
A
435435435
353535
A

(Also same goes to other files where there are multiple pattern match.)

I have this grep command running but it prints the filename every time a pattern matches.

grep -Hof list_of_patterns.txt /path/to/files/*

output:

/file1:A
/file1:A
/file1:A
/file2:B
/file2:B
/file3:C
/file3:B
... and so on.

I know sort can do this when you pipe it after the grep command grep -Hof list_of_patterns.txt /path/to/files/* | sort -u but it only executes when grep is finished. In the real world, my list_of_patterns.txt has hundreds of patterns inside. It takes sometimes an hour to finish the task.

Is there a better way to speedup the process?