Revisions to Break up large log files

deleted 573 characters in body

Source Link

edited Apr 10, 2016 at 13:26

252.3k
69
480
718

If we can safely assume that any line with at least two - starts with a date, you could simply useUse -T as the field separator and take the 1st field as the name of the file to split by year:

awk -F- '($3){d=$1}{print > d".log"}' logfile

And to split by month:

awk -F- '($3){d=$1$2}{print > d".log"}' logfile

The trick here is to see if there is a third field (so, at least two -) and then save either the 1st field (the year, $1) or the 1st and 2nd fields (year and month, $1$2) as ddelimiter and use that variable as the name of the file.

If your data is not clean enough to allow that approach, you can check for date-like strings explicitly. For example, to split by year:

awk -FT '($1~/^[0-9]+-[0-9]+-[0-9]+$/){d=substr($1,1,4)}{print > d".log"}' logfile

And by year+month:

awk -FT '($1~/^[0-9]+-[0-9]+-[0-9]+$/){split($1,d,"-")}{print > d[1]d[2]".log"}' logfile

Here, we check that the first field (defined by whitespace nowT, so the whole date on lines starting with dates, that's what -FT means) is a set of 3 numbers separated by -. If it is, to get the year, we extract the first 4 characters (d=substr($1,1,4)) and, to get the month, we split the 1st field on -, saving the resulting strings in the array d (split($1,d,"-")), and use the 1st two elements of the array (d[1]d[2]) for the file name.

If we can safely assume that any line with at least two - starts with a date, you could simply use - as the field separator and take the 1st field as the name of the file to split by year:

awk -F- '($3){d=$1}{print > d".log"}' logfile

And to split by month:

awk -F- '($3){d=$1$2}{print > d".log"}' logfile

The trick here is to see if there is a third field (so, at least two -) and then save either the 1st field (the year, $1) or the 1st and 2nd fields (year and month, $1$2) as d and use that variable as the name of the file.

If your data is not clean enough to allow that approach, you can check for date-like strings explicitly. For example, to split by year:

awk '($1~/^[0-9]+-[0-9]+-[0-9]+$/){d=substr($1,1,4)}{print d".log"}' logfile

And by year+month:

awk '($1~/^[0-9]+-[0-9]+-[0-9]+$/){split($1,d,"-")}{print > d[1]d[2]".log"}' logfile

Here, we check that the first field (defined by whitespace now, so the whole date on lines starting with dates) is a set of 3 numbers separated by -. If it is, to get the year, we extract the first 4 characters (d=substr($1,1,4)) and, to get the month, we split the 1st field on -, saving the resulting strings in the array d (split($1,d,"-")), and use the 1st two elements of the array (d[1]d[2]) for the file name.

Use T as the field delimiter and check for date-like strings explicitly. For example, to split by year:

awk -FT '($1~/^[0-9]+-[0-9]+-[0-9]+$/){d=substr($1,1,4)}{print > d".log"}' logfile

And by year+month:

awk -FT '($1~/^[0-9]+-[0-9]+-[0-9]+$/){split($1,d,"-")}{print > d[1]d[2]".log"}' logfile

Here, we check that the first field (defined by T, so the whole date on lines starting with dates, that's what -FT means) is a set of 3 numbers separated by -. If it is, to get the year, we extract the first 4 characters (d=substr($1,1,4)) and, to get the month, we split the 1st field on -, saving the resulting strings in the array d (split($1,d,"-")), and use the 1st two elements of the array (d[1]d[2]) for the file name.

added 342 characters in body

Source Link

edited Apr 8, 2016 at 8:07

terdon ♦

252.3k
69
480
718

If it is enough to just have one log file per daywe can safely assume that any line with at least two - starts with a date, you can docould simply use - as the field separator and take the 1st field as the name of the file to split by year:

awk '($1~/[0-9]*-[0-9]*-[0F-9]*/ '($3){d=$1}{print > d".log"}' logfile

For example, if you have the data in your question saved in logfile, the above will produceAnd to split by month:

$awk ls
2014-04F-07.log  2014-04-08.log'($3){d=$1$2}{print > 2015-04-08d".log log"}' logfile
$ for i in *log; do echo "=== $i ==="; cat $i; done
===

The trick here is to see if there is a third field (so, at least two -) and then save either the 1st field (the year, $1) or the 1st and 2nd fields (year and month, $1$2) as d and use that variable as the name of the file.

If your data is not clean enough to allow that approach, you can check for date-like strings explicitly. For example, to split by year:

awk 2014'($1~/^[0-049]+-07.log ===
2014[0-049]+-07 23:59:58 CheckForCallAction [ERROR] Exception caught
Undated line [0-9]+$/){d=substr($1,1
Undated line 2
===,4)}{print 2014-04-08d".log ===
2014-04-08 00:00:03 MobileAppRequestFilter [DEBUG] Action
undatedlog"}' linelogfile 3
===

And by year+month:

awk 2015'($1~/^[0-049]+-08.log ===
2015[0-049]+-08[0-9]+$/){split($1,d,"-")}{print 00:00:03> MobileAppRequestFilterd[1]d[2]".log"}' [DEBUG]logfile ActionB

Explanation

($1~/[0-9]*-[0-9]*-[0-9]*/){d=$1} : if the first field of this line looks like a date string, set the variable d to the first field.

{print > d".log"} : print each line into a file whose name is the current value of d with the extension ".log".

Here, we check that the first field (defined by whitespace now, so the whole date on lines starting with dates) is a set of 3 numbers separated by -. If it is, to get the year, we extract the first 4 characters (d=substr($1,1,4)) and, to get the month, we split the 1st field on -, saving the resulting strings in the array d (split($1,d,"-")), and use the 1st two elements of the array (d[1]d[2]) for the file name.

If it is enough to just have one log file per day, you can do:

awk '($1~/[0-9]*-[0-9]*-[0-9]*/){d=$1}{print > d".log"}' logfile

For example, if you have the data in your question saved in logfile, the above will produce:

$ ls
2014-04-07.log  2014-04-08.log  2015-04-08.log  logfile
$ for i in *log; do echo "=== $i ==="; cat $i; done
=== 2014-04-07.log ===
2014-04-07 23:59:58 CheckForCallAction [ERROR] Exception caught
Undated line 1
Undated line 2
=== 2014-04-08.log ===
2014-04-08 00:00:03 MobileAppRequestFilter [DEBUG] Action
undated line 3
=== 2015-04-08.log ===
2015-04-08 00:00:03 MobileAppRequestFilter [DEBUG] ActionB

Explanation

($1~/[0-9]*-[0-9]*-[0-9]*/){d=$1} : if the first field of this line looks like a date string, set the variable d to the first field.

{print > d".log"} : print each line into a file whose name is the current value of d with the extension ".log".

If we can safely assume that any line with at least two - starts with a date, you could simply use - as the field separator and take the 1st field as the name of the file to split by year:

awk -F- '($3){d=$1}{print > d".log"}' logfile

And to split by month:

awk -F- '($3){d=$1$2}{print > d".log"}' logfile

The trick here is to see if there is a third field (so, at least two -) and then save either the 1st field (the year, $1) or the 1st and 2nd fields (year and month, $1$2) as d and use that variable as the name of the file.

If your data is not clean enough to allow that approach, you can check for date-like strings explicitly. For example, to split by year:

awk '($1~/^[0-9]+-[0-9]+-[0-9]+$/){d=substr($1,1,4)}{print d".log"}' logfile

And by year+month:

awk '($1~/^[0-9]+-[0-9]+-[0-9]+$/){split($1,d,"-")}{print > d[1]d[2]".log"}' logfile

Here, we check that the first field (defined by whitespace now, so the whole date on lines starting with dates) is a set of 3 numbers separated by -. If it is, to get the year, we extract the first 4 characters (d=substr($1,1,4)) and, to get the month, we split the 1st field on -, saving the resulting strings in the array d (split($1,d,"-")), and use the 1st two elements of the array (d[1]d[2]) for the file name.

Source Link

answered Apr 7, 2016 at 15:01

terdon ♦

252.3k
69
480
718

If it is enough to just have one log file per day, you can do:

awk '($1~/[0-9]*-[0-9]*-[0-9]*/){d=$1}{print > d".log"}' logfile

For example, if you have the data in your question saved in logfile, the above will produce:

$ ls
2014-04-07.log  2014-04-08.log  2015-04-08.log  logfile
$ for i in *log; do echo "=== $i ==="; cat $i; done
=== 2014-04-07.log ===
2014-04-07 23:59:58 CheckForCallAction [ERROR] Exception caught
Undated line 1
Undated line 2
=== 2014-04-08.log ===
2014-04-08 00:00:03 MobileAppRequestFilter [DEBUG] Action
undated line 3
=== 2015-04-08.log ===
2015-04-08 00:00:03 MobileAppRequestFilter [DEBUG] ActionB

Explanation

($1~/[0-9]*-[0-9]*-[0-9]*/){d=$1} : if the first field of this line looks like a date string, set the variable d to the first field.
{print > d".log"} : print each line into a file whose name is the current value of d with the extension ".log".

Stack Exchange Network

Return to Answer

Explanation

Explanation

Explanation