1

I have extracted the following information from the raw logs below using this command:

echo -e "Timestamp\t\tEmailTo:\t\tEmailFrom:\t\t\t\t\tIPAddress:\tErrorCodes:" && sed -n -e 's/.*\([0-9][0-9][0-9][0-9]\-[0-9][0-9]\-[0-9]*\) .*\([0-9][0-9]:[0-9][0-9]:[0-9][0-9]*\).*/\1 \2 /p' logs

Output:

Timestamp       EmailTo:        EmailFrom:                  IPAddress:  ErrorCodes:
2017-01-02 12:50:00 
2017-01-02 13:10:25 

Raw logs:

2017-01-02 12:50:00 1cNxNS-001NKu-9B == [email protected] R=dkim_lookuphost T=dkim_remote_smtp defer (-45) H=mta6.am0.yahoodns.net [98.138.112.38]: SMTP error from remote mail server after MAIL FROM:<[email protected]> SIZE=1772: 421 4.7.0 [TSS04] Messages from 192.168.1.269 temporarily deferred due to user complaints - 4.16.55.1; see https://help.yahoo.com/kb/postmaster/SLN3434.html
2017-01-02 13:10:25 1cNxhD-001VZ3-0f == [email protected] ([email protected]) <[email protected]> R=lookuphost T=remote_smtp defer (-45) H=mta7.am0.yahoodns.net [98.138.112.34]: SMTP error from remote mail server after MAIL FROM:<[email protected]> SIZE=87839: 500 5.9.0 [TSS04] Messages from 192.168.1.269 temporarily deferred due to user complaints - 4.16.55.1; see https://help.yahoo.com/kb/postmaster/SLN3434.html

But I am unable to extract the other information I need; it should looks like:

Timestamp            EmailTo:              mailFrom:            IPAddress:      ErrorCodes:

2017-01-02 12:50:00  [email protected]  [email protected]       192.168.1.269   421 4.7.0
2017-01-02 13:10:25  [email protected]  [email protected]      192.168.1.269   500 5.9.0

How can I extract all the information using sed?

2
  • What's exactly the problem you have with sed patterns? You just need to extend your current pattern to match the information you need, and not just the first two pieces. Commented Jan 3, 2017 at 15:10
  • I have tried to extend it further but unable to extract like i'm trying to extract third field using below command but not get successful result. sed -n 's/.*\([0-9][0-9][0-9][0-9]\-[0-9][0-9]\-[0-9]*\) .*\([0-9][0-9]:[0-9][0-9]:[0-9][0-9]*\) .\^([^==]\[A-Z0-9._%-+]\@[A-Z0-9.-]\.[A-Z]{2,4}*\).*/\1 \2 \3 /p' exim_logs Commented Jan 3, 2017 at 17:46

2 Answers 2

2

You can try this sed expression:

sed -e 's/^\(.* .* \).* .*== \([^ ]* \).*MAIL FROM:<\([^ ]*\)> [^ ]* \([0-9 .]*\)\[.*Messages from \([^ ]*\).*$/\1\t\2\t\3\t\5\t\4/'

It works for me with your example.

Explanation

This sed expression contains only one command -- s/.../.../.

First part of s///:

'^\(.* .* \)'      -- Timestamp, two first space-separated blocks of text, \1.
'.* .*== '         -- Uninteresting text after timestamp.
'\([^ ]* \)'       -- Block of test between spaces, first email address, \2.
'.*MAIL FROM:<'    -- Position before second email.
'\([^ ]*\)>'       -- Second email addr, non-space characters, ended by '>', \3.
' [^ ]* '          -- SIZE=...:
'\([0-9 .]*\)\['   -- Error codes: digits, spaces and dots ended by '[', \4.
'.*Messages from ' -- Position before IP.
'\([^ ]*\)'        -- Non-space characters, ended by space, IP. \5.
'.*$'              -- Text before end of string, not interesting.

As you can see, it's just direct description of raw logs, there is nothing interesting.

Second part of s/// is just placing \N in right order with \t (tab character) as a separator.

1
  • Thanks it works for me but please elaborate this command. Commented Jan 4, 2017 at 6:17
0

I've not got much experience with awk, but thought I'd have a go. I imagine this is quite fragile as I dont know how many log lines you're trying to get with this.

Anyway, this uses the BEGIN block to set up the variables to pick out, and a format-string for printing before displaying the header. The time and EmailTo are predictable, so then can use the numbered fields ($1, $2 and $5) before the three sets of regexps which are only very rough. Any suggestions to improve would be appreciated!

awk 'BEGIN {
        from=""; ip=""; error=""; fstr="%-24s%-24s%-40s%-16s%s\n";
        printf(fstr, "Timestamp:", "EmailTo:", "EmailFrom:", "IPAddress:", "ErrorCodes:");
    }
{   for (i=6; i<NF; i++)
    {   
    # From Address
    if ($i ~ /FROM:<[^ ]*>/)  
        from=substr($i, 7, length($i)-7);
    # Errors found in two adjacent fields.
    if ($(i-1) ~ /[[:digit:]]{3}/ && $i ~ /[[:digit:]]\.[[:digit:]]\.[[:digit:]]/)
        error=$(i-1) " " $i;
    # From address after predictable string.
    if ($(i-2) " " $(i-1) == "Messages from" && $i ~ /[[:digit:].]{7,15}/)
        ip=$i;
    }
    printf(fstr, $1" "$2, $5, from, ip, error);
}' logs

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.