How to filter for only unique errors in multiple logs using grep?

Question

I am trying to use the following pattern on Ubuntu:

grep -Eri "warning|error|critical|severe|fatal" --color=auto

to find relevant errors in many different .log files recursively in /var/log and its subfolders.

The issue I am having is that this results in tens of thousands of lines of matches being printed as the expression is run. I'd like to filter these somehow in at least one of the following ways:

Print but then skip a match if more than eg. 3 of the same match exist
Show only unique matches (i.e. print one of each line found)

Can I do this by piping the output to something? Currently going through each log for errors is incredibly time consuming which is why I am trying this. But the expression I am using prints so much info that it is also not usable itself either. I have tried piping to 'less' but that removes highlighting which makes it harder to read and does not fix the issue with the output being so large.

I realise I could also limit the expression to specific files at a time, but as I mentioned some logs are full of matches and others have very little. So further filtering out duplicates would be really helpful.

Here is an example error line in one of the many logs I am searching:

./artifactory/artifactory-service.log:20:2021-07-20T08:45:30.248Z [jfrt ] [ERROR] [.j.a.c.g.GrpcStreamObserver:97] [c-default-executor-1] - refreshing affected platform config stream - got an error

If there are hundreds of such errors, I would like to show eg. at most 3 of these before moving onto the next match.

Alternatively, due to how the dates are listed in the log. It would be great to filter to match for only specific dates, how would I go about doing this? Date filtering would limit the output greatly.

Yes, you can pipe to something, but we can't really help without an example. Will the lines be 100% identical? Should we only consider certain fields? Certain characters? Do you also want to include the file name? Does that affect what is considered a duplicate? Should the same error message in two different files be treated as the same error or a separate one? Please edit your question and add an example input (input here being the result of your grep, or alternatively a set of input files) and show us what your desired output from that input would be. Also mention your OS, pelase. — terdon
– terdon ♦, Commented Jul 20, 2021 at 9:53
Thanks so much for the reply. I've updated the question with the information you requested. I would like to include the file name yes. I've just started using eg. find . -name '*.log' -exec grep -Ein 'warning|warn|error|critical|severe|fatal' --color=auto {} + but it does not help with limiting the scope to specific matching dates, as I am unsure how to include that in the search string. — anon
– anon, Commented Jul 20, 2021 at 10:19
Cut off everything that contains changing values (timestamps, connection IDs) and pipe that to sort | uniq — Panki
– Panki, Commented Jul 20, 2021 at 10:52
Hi Panki, could you please give an example using the log I provided? — anon
– anon, Commented Jul 20, 2021 at 11:26
Thanks for the edit, but I'm afraid a single line doesn't tell us much since you are specifically talking about "duplicates". Will there be multiple identical lines in the same file? Will there be similar lines with maybe a different timestpamp? Would those be duplicates? Your output also includes the line number, if that changes (as it will), does it mean they are not duplicates? Remember, you know what you need, but we don't and we need to explain it to a computer, so it has to be very clear. Can you give us a few lines, some of which are dupes and some not so we can udnerstand? — terdon
– terdon ♦, Commented Jul 20, 2021 at 11:50

Stéphane Chazelas · Accepted Answer · 2024-10-13 15:37:32Z

Could be:

find . -name '*.log' -type f -exec perl -lne '
  if(/^\S++ (.*(?:warning|error|critical|severe|fatal).*)/i && ++$seen{$1} <= 3) {
    print "$ARGV:$_"
  }
  undef %seen if eof' {} +

Where we skip the leading timestamp with \S++ and record the number of occurrences of the rest of the line in a %seen hash and skip printing after the third (resetting the counter for each new file).

Matching a specific date is just doing a regex match on the timestamp or as your timestamps use standard ISO8601 format, do some string comparison for a date range:

find . -name '*.log' -type f -exec perl -lne '
  if(/^(\S++) (.*(?:warning|error|critical|severe|fatal).*)/i &&
     $1 ge "2021-07-19" && $1 lt "2021-07-21" && ++$seen{$2} <= 3) {
    print "$ARGV:$_"
  }
  undef %seen if eof' {} +

Stack Exchange Network

How to filter for only unique errors in multiple logs using grep?

1 Answer 1

You must log in to answer this question.

Hot Network Questions

How to filter for only unique errors in multiple logs using grep?

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions