Using awk to break out a timestamp and format it

Question

For filenames like this:

fileLoad.xml2017-12-21_10_55_53-153.txt
otherFile.xml2017-12-20-11_23_01-87899.txt
someFile.xml2017-11-30-21_00_59-1.txt

What I'm trying to accomplish with awk is to isolate the file name up through .xml and then isolate and format the timestamp for use in a csv/database.

I have the following:

NR==1 {
    fn=substr(FILENAME, 0, FILENAME-5);
    ts=fn;                                                                                                                                                                                                
    sub(/[0-9]{4}.*$/,"",fn);                                                                                      
    sub(/^\w+\.xml/,"",ts);
} {
     printf "fn\tts"
}

I can strip off the '-' from the end, but after that I can't figure out how in awk to convert the remaining timestamp to be formatted as 2017-11-30 21:00:59.

@mdpc - This is part of a larger program I'm trying to figure out how to script. awk was apparently the right tool for parsing the contents of the file, but I'm trying to isolate the specific task (file) from the timestamp of when it was created as separate fields in an output csv. — Noah Goodrich
– Noah Goodrich, Commented Dec 21, 2017 at 19:02
I'd say a criterium for "the right tool" is the ease with which you can wield it: echo "fileLoad.xml2017-12-21_10_55_53-153.txt" | sed -r 's/.*xml//; s/-[[:digit:]]*.txt//; s/_/ /; s/_/:/g' — Tomáš Pospíšek
– Tomáš Pospíšek, Commented Dec 21, 2017 at 19:33
Which awk are you using? I'm not sure which ones support \w in regexes, GNU awk does. Also, is the first file correctly named there? It has an underscore _ between the date and the time, the others have a dash -. (I'm just asking since this would be rather straightforward with GNU awk, esp. if the first file name is in error.) — ilkkachu
– ilkkachu, Commented Dec 21, 2017 at 19:56
Does it have to be awk, or can it be gawk? If you can use gawk, it's easy. — Lizardx
– Lizardx, Commented Dec 21, 2017 at 20:03

Lizardx · Accepted Answer · 2017-12-22 00:02:39Z

This works, though I wouldn't personally use awk for this, I'd use gawk, which has some significant features that make this exact type operation easy, gensub()..

echo 'fileLoad.xml2017-12-21_10_55_53-153.txt
otherFile.xml2017-12-20-11_23_01-87899.txt
someFile.xml2017-11-30-21_00_59-1.txt' | awk '{
  gsub(/^.*\.xml|-[0-9]+\.txt/,"",$0);
  date=$0; 
  time=$0; 
  sub(/[-_][0-9]{2}_[0-9]{2}_[0-9]{2}$/,"",date); 
  sub(/^[0-9]{4}-[0-9]{2}-[0-9]{2}[-_]/,"",time);
  gsub(/_/,":",time);
  print date " "  time 
}'
2017-12-21 10:55:53
2017-12-20 11:23:01
2017-11-30 21:00:59

Since you specified awk, this is one way to do it, though a bit basic.

Note that due to the absence of gensub, I transferred the $0 value to two holders, so I could strip off the beginning of one, and the end of the other.

Stripping out the initial starters/enders is easy as you can see, and that leaves you just with the date/time data to further process.

Using gawk and gensub it's easier.

echo 'fileLoad.xml2017-12-21_10_55_53-153.txt
otherFile.xml2017-12-20-11_23_01-87899.txt
someFile.xml2017-11-30-21_00_59-1.txt' | gawk '{
  gsub(/^.*\.xml|-[0-9]+\.txt/,"",$0);
  datetime = gensub(/^([0-9-]{10})[-_]([0-9_]{8})$/,"\\1 \\2",1,$0);
  gsub(/_/,":",datetime);
  print datetime 
}'
2017-12-21 10:55:53
2017-12-20 11:23:01
2017-11-30 21:00:59

Note that the start/end patterns do the same thing, one takes the actual pattern and reproduced it, the second just says, give me the first 10 matches to [0-9-] and the last 8 matches to [0-9_], it just depends which is easier to read.

The real question is if you wouldn't have been better off using Perl for this job however.

Stack Exchange Network

Using awk to break out a timestamp and format it

1 Answer 1

You must log in to answer this question.

Hot Network Questions

Using awk to break out a timestamp and format it

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions