Split large log file into pieces based on date

Question

I am trying to work out a Linux command to split a large log file into pieces based on date.

Using How to split existing apache logfile by month? as a starting point, I tried:

awk '{ split($4,array,"/"); print > array[2] ".txt" }' TestLog.txt

On my sample TestLog.txt with entries for May, Jun, and Jul of different years, this created text files May.txt, Jun.txt and Jul.txt:

In order to understand the values in the arrays, I eliminated the output file, and displayed the array values using:

awk '{ split($4,array,"/"); print  array[1] "  "  array[2] "  " array[3] "  " array[4] }' TestLog.txt

Where the first 2 lines of TestLog.txt are:

124.115.5.11 - - [30/May/2011:23:21:37 -0500] "GET / HTTP/1.0" 200 206492 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322;TencentTraveler)"
58.61.164.39 - - [31/May/2011:00:36:35 -0500] "GET / HTTP/1.0" 200 206492 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322;TencentTraveler)"

This resulted in [30 May 2011:23:21:37 for the first line in the file.

The results were very confusing to me. In particular:

Why is array[1] equal to [30 and not 124.115.5.11 - - [30 ?
Why is array[3] equal to 2011:23:21:37 and not 2011:00:36:35 -0500] "GET?
Why is array[4] null?
What should the value of array[0] be?

I just used what had been specified at the link I cited. I assumed it was specifying that the line should be split into 4 pieces. Based on the explanation below, I now understand that $4 gabbed the 4th string ([30/May/2011:23:21:37) for further spitting. — Mike
– Mike, Commented Apr 5, 2016 at 2:45

jimmij · Accepted Answer · 2016-04-05 00:22:59Z

3

Lets take the first line:

124.115.5.11 - - [30/May/2011:23:21:37 -0500] "GET / HTTP/1.0" 200 206492 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322;TencentTraveler)"

and the crucial part of the awk snippet:

awk '{ split($4,array,"/") ...

Here what is happening:

awk runs and splits the line on the spaces (default field separator)
4th field in the line is additionally being split on / character
the result of the split is put into the array
later on the whole line is printed to the file named as a second subfield (array[2]) of the 4th field

so $4 field initially contained [30/May/2011:23:21:37, and after split we have

array[1]=[30
array[2]=May
array[3]=2011:23:21:37

There is no array[4], because there the 4th field doesn't contain 4th "subfield" and there is no array[0] because in awk array indexes start from 1.

edited Apr 5, 2016 at 0:22

answered Apr 5, 2016 at 0:17

jimmij

48.7k20 gold badges136 silver badges141 bronze badges

1

Aha, I couldn't understand what he was trying to do. So in this case he could just use awk -F/ '{ print > $2 ".txt" }' TestLog.txt.

Wildcard
– Wildcard

2016-04-05 00:22:29 +00:00
Commented Apr 5, 2016 at 0:22
@Wildcard very likely, if it is guarantee that there are no / before date (judging from the link to SO thats the case).

jimmij
– jimmij

2016-04-05 00:27:29 +00:00
Commented Apr 5, 2016 at 0:27
I wondered if spit had a default delimiter of space. So those splits are basically being ignored, and the 4th string, (delimitered by spaces) is then split using / as the delimiter. Then how can I extract the year (2011) also?

Mike
– Mike

2016-04-05 02:39:58 +00:00
Commented Apr 5, 2016 at 2:39
@Mike You can add : as a field separator: awk -F '[/:]' '{print "Month: "$2", year: "$3}'

jimmij
– jimmij

2016-04-05 08:40:48 +00:00
Commented Apr 5, 2016 at 8:40

Add a comment |

Stack Exchange Network

Split large log file into pieces based on date

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Split large log file into pieces based on date

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions