2

I need to keep only last 30 days events in the log and remove other data.

Log has timestamp (date +%e/%b/%Y:%H:%M) and next line values like

01/Jan/2025:00:00
value1
value2
...
01/Jan/2025:06:45
value1
value2
...
05/Jan/2025:02:20
value1
value2
value3

I get start date pattern via

date "+%e/%b/%Y" --date="30 days ago"

and save it to variable date_pattern.

Then I try to

sed "0,~$date_pattern~/d" file

(I use ~ cause / is the content part of log file)

and get

sed: -e expression #1, char 6: invalid usage of line address 0

What is incorrect in the sed construct?

4
  • Two quick suggestions: using gnu awk is a good thing, and Date Format == YYYYMMDD HHMMSS makes for numeric sorts. Also, better than formatting or converting the date string is to change the log format in the logger server. Commented Sep 26 at 20:39
  • @Dru sometimes (actually, quite often) you have no choice but to deal with data as it actually is rather than data in the ideal format you'd prefer it to be. Sometimes the program generating the data can not or will not be changed, and sometimes doing so would break other programs which depend on the original data format. And, quite often, the reason sed or perl or awk etc are being used is to convert the original data from an inconvenient format to a format more usable for the current task. Commented Sep 27 at 3:06
  • Hmm, the point was that most often you want deal with a LOG file in an optimal, long-term way specific to common use cases for log files. Is there an advantage to having the date format in a locale-default manner? (I think the complexity of a program anyone writes for doing this common task is proportional to the complexity of date handling [especially when the record's primary key is a date], Having the date on every line in the most useful format, as a tip, is really useful: Particularly if it provides fixed-width records, etc. Commented Sep 27 at 7:23
  • You are missing the point - there are all manner of stupid programs written by stupid programmers to do stupid things in stupid ways, often ordered to be that way by idiot clients, moronic colleagues, stupid managers and cretinous CEOs, and sometimes you have no power to change that and just have to deal with it as it is. Yes, it would be nice if everything was always sane and perfect. The real world isn't like that. Commented Sep 27 at 11:15

3 Answers 3

8

When you want to use a different character than / for a regex in an address, you have to prepend a backslash to the opening character.

sed "0,\~$date_pattern~/d" file

This still doesn't work, though:

sed: -e expression #1, char 17: unknown command: `/'

Yes, there's no slash command. Why is the slash there? Just remove it.

sed "0,\~$date_pattern~d" file

This isn't the correct way to process the dates, though. If the $date_pattern is missing in the file, everything gets deleted. It's safer to use a language that can parse the dates, e.g.:

perl -MTime::Piece -nlwe '$d  = localtime->strptime($&, "%d/%b/%Y:%H:%M")
                                if m{^\d{2}/\w{3}/\d{4}:\d\d:\d\d$};
                          print if $d + 30 * 24 * 60 * 60 > localtime' < file
8
  • It works, thank you. Is there a way to keep line contains $date_pattern? Cause it removes 01/Jan/2025:00:00 and keep values. I want to keep this date too Commented Sep 26 at 10:34
  • @Aleksey: sed "0,\~$date_pattern~{\~$date_pattern~p;d}" seems to work. Commented Sep 26 at 12:42
  • ... or bring the last command outside the quotes and shell escape the ! like sed "0,\~$date_pattern~"{//\!d} file ... OP seems to be using GNU sed which uses the tilde to set action for every N line starting from a certain line number like seq 6 | sed '0~3s/./x/' i.e. first~step so expecting 0~N and not 0,~ ... hence the error reported in the OP invalid usage of line address 0 ... // is the last match. Commented Sep 26 at 12:59
  • @Raffa: No, they use ~ instead of / for an address, not the first~step form. That's why there's a comma after the zero. Commented Sep 26 at 14:09
  • 1
    That $in = 1 if $d + 30 * 24 * 60 * 60 > localtime doesn't set $in back to 0 when a timestamp is found for earlier dates. So you might as well skip the time parsing after the first timestamp from 30 days ago has been seen. Or use $in = $d + 30 * 24 * 60 * 60 > localtime Commented Oct 3 at 8:51
1

This will robustly do what I think you want, using any awk:

awk -v d="$(date '+%d/%b/%Y' --date='30 days ago')" '
    f || ($0 ~ "^"d":([0-1][0-9]|2[0-3]):[0-5][0-9]$") { print; f=1 }
' file

It's more robust than the sed command you're trying to make work because your current sed command will produce false matches on various timestamp or other input lines due to not using anchors, i.e. start/end of string (line) delimiters ^ and $, and to use anchors:

  1. You need to use %d (0-padded) instead of %e (space-padded) in your date output format string so it matches the 0-padded date format string in your input, and
  2. You need to include a regexp segment to match the :HH:MM part of the input.

and as an added bonus you don't have to deal with seds arcane syntax and frequent non-portability between sed implementations.

0

The line address 0 is a non-standard extension of the GNU implementation of sed that allows doing things like 0,/pattern/ x to run the x action on lines from the start of the input(s) (or start of the file with -s or -i, both also GNU extensions) to the first line that matches pattern even if that line is the first line.

With the standard 1,/pattern/ x that would run x on lines 1 to the first line after that that match the pattern, so not do what you want when line 1 matches the pattern.

GNU sed has a couple of other extensions involving the ~ character, where you can specify a step when selecting addresses. 5~3 is to select every 3rd line starting with the 5th one and 12,~4 to select line 12 to the first line after that that's a multiple of 4 (here 16).

For some reason, in the latter, it doesn't allow the first address to be 0¹ which causes the error you're getting (same for 0,4), though for those, you'd just write 1,4.

Now, as @choroba said, the correct syntax would be:

date_pattern=$(
  LC_ALL=C date -d '30 days ago' +'^%d/%b/%Y:'
)
sed "0,\~${date_pattern}~ d"

where you need to prefix the first address delimiter with \ if you want to use one other than / (also note the LC_ALL=C which you need to guarantee the month name abbreviations will be in English, the %d instead of %e as already noted by @choroba, the ^ to anchor the search at the start of the line and the -d (marginally more portable than --date) option going before the non-option +^%d/%b/%Y so it still works with GNU sed when in a POSIX environment).

But that deletes from the start of the input(s) to the first that matches the regular expression stored in $date_pattern, which means it would delete the first line (the timestamp) of the first log entry from that day which is not what you want.

Here it would be better to use the standard and portable (without GNU extension):

sed "\~${date_pattern}~,\$!d"

That is delete lines except (!) the first line from that day up to the last ($ which we escape with \ as $ is special to the shell inside double quotes).

That still assumes there is at least one line in the input from that day, and it may still output lines from earlier days if the logs are not guaranteed to be in chronological order².


¹ and ADDR1,~0 seems to be the same as ADDR1 for some reason!?

² which is not uncommon. That typically happens when the timestamp is for the start of an event (like when an HTTP request is received in apache server logs) but the log entry is added when the event is over (the HTTP response has been sent), and several events can happen in parallel.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.