I am trying to work out a Linux command to split a large log file into pieces based on date.
Using How to split existing apache logfile by month? as a starting point, I tried:
awk '{ split($4,array,"/"); print > array[2] ".txt" }' TestLog.txt
On my sample TestLog.txt with entries for May, Jun, and Jul of different years, this created text files May.txt, Jun.txt and Jul.txt:
In order to understand the values in the arrays, I eliminated the output file, and displayed the array values using:
awk '{ split($4,array,"/"); print  array[1] "  "  array[2] "  " array[3] "  " array[4] }' TestLog.txt
Where the first 2 lines of TestLog.txt are:
124.115.5.11 - - [30/May/2011:23:21:37 -0500] "GET / HTTP/1.0" 200 206492 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322;TencentTraveler)"
58.61.164.39 - - [31/May/2011:00:36:35 -0500] "GET / HTTP/1.0" 200 206492 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322;TencentTraveler)"
This resulted in [30  May  2011:23:21:37     for the first line in the file.
The results were very confusing to me. In particular:
- Why is - array[1]equal to- [30and not- 124.115.5.11 - - [30?
- Why is - array[3]equal to- 2011:23:21:37and not- 2011:00:36:35 -0500] "GET?
- Why is - array[4]null?
- What should the value of - array[0]be?
$4?