Extract specific information from logs

Question

I have extracted the following information from the raw logs below using this command:

echo -e "Timestamp\t\tEmailTo:\t\tEmailFrom:\t\t\t\t\tIPAddress:\tErrorCodes:" && sed -n -e 's/.*$[0-9][0-9][0-9][0-9]\-[0-9][0-9]\-[0-9]*$ .*$[0-9][0-9]:[0-9][0-9]:[0-9][0-9]*$.*/\1 \2 /p' logs

Output:

Timestamp       EmailTo:        EmailFrom:                  IPAddress:  ErrorCodes:
2017-01-02 12:50:00 
2017-01-02 13:10:25

Raw logs:

2017-01-02 12:50:00 1cNxNS-001NKu-9B == [email protected] R=dkim_lookuphost T=dkim_remote_smtp defer (-45) H=mta6.am0.yahoodns.net [98.138.112.38]: SMTP error from remote mail server after MAIL FROM:<[email protected]> SIZE=1772: 421 4.7.0 [TSS04] Messages from 192.168.1.269 temporarily deferred due to user complaints - 4.16.55.1; see https://help.yahoo.com/kb/postmaster/SLN3434.html
2017-01-02 13:10:25 1cNxhD-001VZ3-0f == [email protected] ([email protected]) <[email protected]> R=lookuphost T=remote_smtp defer (-45) H=mta7.am0.yahoodns.net [98.138.112.34]: SMTP error from remote mail server after MAIL FROM:<[email protected]> SIZE=87839: 500 5.9.0 [TSS04] Messages from 192.168.1.269 temporarily deferred due to user complaints - 4.16.55.1; see https://help.yahoo.com/kb/postmaster/SLN3434.html

But I am unable to extract the other information I need; it should looks like:

Timestamp            EmailTo:              mailFrom:            IPAddress:      ErrorCodes:

2017-01-02 12:50:00  [email protected]  [email protected]       192.168.1.269   421 4.7.0
2017-01-02 13:10:25  [email protected]  [email protected]      192.168.1.269   500 5.9.0

How can I extract all the information using sed?

What's exactly the problem you have with sed patterns? You just need to extend your current pattern to match the information you need, and not just the first two pieces. — Alessandro Dotti Contra
– Alessandro Dotti Contra, Commented Jan 3, 2017 at 15:10
I have tried to extend it further but unable to extract like i'm trying to extract third field using below command but not get successful result. sed -n 's/.*$[0-9][0-9][0-9][0-9]\-[0-9][0-9]\-[0-9]*$ .*$[0-9][0-9]:[0-9][0-9]:[0-9][0-9]*$ .\^([^==]\[A-Z0-9._%-+]\@[A-Z0-9.-]\.[A-Z]{2,4}*\).*/\1 \2 \3 /p' exim_logs — blaCkninJa
– blaCkninJa, Commented Jan 3, 2017 at 17:46

Community · Accepted Answer · 2020-06-11 14:16:50Z

You can try this sed expression:

sed -e 's/^\(.* .* \).* .*== \([^ ]* \).*MAIL FROM:<\([^ ]*\)> [^ ]* \([0-9 .]*\)\[.*Messages from \([^ ]*\).*$/\1\t\2\t\3\t\5\t\4/'

It works for me with your example.

Explanation

This sed expression contains only one command -- s/.../.../.

First part of s///:

'^\(.* .* \)'      -- Timestamp, two first space-separated blocks of text, \1.
'.* .*== '         -- Uninteresting text after timestamp.
'\([^ ]* \)'       -- Block of test between spaces, first email address, \2.
'.*MAIL FROM:<'    -- Position before second email.
'\([^ ]*\)>'       -- Second email addr, non-space characters, ended by '>', \3.
' [^ ]* '          -- SIZE=...:
'\([0-9 .]*\)\['   -- Error codes: digits, spaces and dots ended by '[', \4.
'.*Messages from ' -- Position before IP.
'\([^ ]*\)'        -- Non-space characters, ended by space, IP. \5.
'.*$'              -- Text before end of string, not interesting.

As you can see, it's just direct description of raw logs, there is nothing interesting.

Second part of s/// is just placing \N in right order with \t (tab character) as a separator.

Thanks it works for me but please elaborate this command.

blaCkninJa
– blaCkninJa

2017-01-04 06:17:53 +00:00
Commented Jan 4, 2017 at 6:17 — blaCkninJa
– blaCkninJa, Commented Jan 4, 2017 at 6:17

Guy · Accepted Answer · 2017-01-03 22:32:27Z

I've not got much experience with awk, but thought I'd have a go. I imagine this is quite fragile as I dont know how many log lines you're trying to get with this.

Anyway, this uses the BEGIN block to set up the variables to pick out, and a format-string for printing before displaying the header. The time and EmailTo are predictable, so then can use the numbered fields ($1, $2 and $5) before the three sets of regexps which are only very rough. Any suggestions to improve would be appreciated!

awk 'BEGIN {
        from=""; ip=""; error=""; fstr="%-24s%-24s%-40s%-16s%s\n";
        printf(fstr, "Timestamp:", "EmailTo:", "EmailFrom:", "IPAddress:", "ErrorCodes:");
    }
{   for (i=6; i<NF; i++)
    {   
    # From Address
    if ($i ~ /FROM:<[^ ]*>/)  
        from=substr($i, 7, length($i)-7);
    # Errors found in two adjacent fields.
    if ($(i-1) ~ /[[:digit:]]{3}/ && $i ~ /[[:digit:]]\.[[:digit:]]\.[[:digit:]]/)
        error=$(i-1) " " $i;
    # From address after predictable string.
    if ($(i-2) " " $(i-1) == "Messages from" && $i ~ /[[:digit:].]{7,15}/)
        ip=$i;
    }
    printf(fstr, $1" "$2, $5, from, ip, error);
}' logs

Stack Exchange Network

Extract specific information from logs

2 Answers 2

Explanation

You must log in to answer this question.

Hot Network Questions

Extract specific information from logs

2 Answers 2

Explanation

You must log in to answer this question.

Related

Hot Network Questions