3
\$\begingroup\$

I made the following regular expression for parsing ngnix log

log_1 = "1.169.137.128 -  - [29/jun/2017:07:10:50 +0300] "GET /api/v2/banner/1717161 http/1.1" 200 2116 "-" "Slotovod" "-" "1498709450-2118016444-4709-10027411" "712e90144abee9" 0.199"

My test cases (https://regex101.com/r/Eyhxod/1)

lineformat = re.compile(r"""(?P<ipaddress>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) -  - \[(?P<dateandtime>\d{2}\/[a-z]{3}\/\d{4}:\d{2}:\d{2}:\d{2} (\+|\-)\d{4})\] \"GET (?P<url>.+?(?=\ http\/1.1")) http\/1.1" \d{3} \d+ "-" (?P<http_user_agent>.+?(?=\ )) "-" "(?P<x_forwaded_for>(.+?))" "(?P<http_xb_user>(.+?))" (?P<request_time>[+-]?([0-9]*[.])?[0-9]+)""",re.IGNORECASE)

Output:

data = re.search(lineformat, log_1)
data.groupdict()

{'ipaddress': '1.169.137.128',
 'dateandtime': '29/jun/2017:07:10:50 +0300',
 'url': '/api/v2/banner/1717161',
 'http_user_agent': '"Slotovod"',
 'x_forwaded_for': '1498709450-2118016444-4709-10027411',
 'http_xb_user': '712e90144abee9',
 'request_time': '0.199'}

I believe I should make it more robust towards edge cases and broken logs. Also I consider splitting my long expression into a smaller one. Any advices towards the best-practices are appreciated.

\$\endgroup\$
1
  • 1
    \$\begingroup\$ Isn't there a commonly known Python module for parsing these log files? You are for sure not the first person in the world to try this. \$\endgroup\$ Commented Mar 5, 2020 at 6:29

1 Answer 1

2
\$\begingroup\$

At the very least, use verbose mode so you can see the whole thing at once. Remember to explicitly include whitespace.

lineformat = re.compile(r"""
   (?P<ipaddress>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+
   -\s+
   -\s+
   \[(?P<dateandtime>\d{2}\/[a-z]{3}\/\d{4}:\d{2}:\d{2}:\d{2} (\+|\-)\d{4})\]\s+
   \"GET (?P<url>.+?(?=\ http\/1.1")) http\/1.1"\s+
   \d{3}\s+
   \d+\s+
   "-"\s+
   (?P<http_user_agent>.+?(?=\ ))\s+
   "-"\s+
   "(?P<x_forwaded_for>(.+?))"\s+
   "(?P<http_xb_user>(.+?))"\s+
   (?P<request_time>[+-]?([0-9]*[.])?[0-9]+)
   """,
   re.IGNORECASE | re.VERBOSE)
\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.