0

I have a human written text file that contains time stamps in form of dd-mm-yyyy,HH:MM or HH:MM:SS. I have managed to extract time stamps from text file using regex but I would like to also get a line of corresponding time stamp. It would be nice to have time stamps in one file and corresponding lines in the other. There could be multiple time stamps per line so same line should occur multiple times.

If this can be done, what if I want only few words or few lines around a time stamp. Idea is just to get time stamps and their context extracted.

For now I have been using Matlab for this, but any Unix tool that's supported on MacOS and portable git bash for Windows. The Mac's grep doesn't support the -P option for Perl regex, which is needed for look around (?<![0-9]).

Here is example of original file and desired outputs:

original:

L&L logfile

14-5-12
16-05-2012
Experiment 1
Device 77212-123-123123
Instrument 2, 34g, 66hz
Notes:
Something weird happened 12:34
Everything is fine 13:07
Log
8:00 routine 1
8:20 routine 2
8:40 routine 3, 8:45 something went south
8:50 routine 4, 8:50:12 weird peak at data

output1:

14-5-12
16-05-2012
12:34
13:07
8:00
8:20
8:40
8:45
8:50
8:50:12

output2:

14-5-12
16-05-2012
Something weird happened 12:34
Everything is fine 13:07
8:00 routine 1
8:20 routine 2
8:40 routine 3, 8:45 something went south
8:40 routine 3, 8:45 something went south
8:50 routine 4, 8:50:12 weird peak at data
8:50 routine 4, 8:50:12 weird peak at data
4
  • 2
    Please edit your question, show us an example of your input file (including examples of each possible time stamp format) and your desired output. Commented Jul 29, 2016 at 8:02
  • Costas showed the simple grep, add e.g. -2 to get two adjacent lines too. Though repeats won't be done nicely with just grep. Commented Jul 29, 2016 at 8:11
  • Assuming that your date and time regex is adequate, (check this out: stackoverflow.com/questions/15491894/… to make sure that yours is all-inclusive), then all you'd need to have a "few" surrounding words is something like (\w*\s){2} (preceding) or (\s\w*){2} (following). Lines would be something like (^.*\n.*){2} for 2 preceding lines and (.*\n.*){2} for 2 following lines, assuming the text has distinct new-line breaks and not just wrapped text. Commented Jul 29, 2016 at 8:24
  • metacpan.org/pod/Regexp::Common::time Commented Jul 29, 2016 at 20:34

1 Answer 1

1
grep -Eo '[0-9.]{10},[0-9]{2}:[0-9]{2}(:[0-9]{2})?' text.file

will produce just time-stamps. By remove -o option you'll receive full lines

grep -E '[0-9.]{10},[0-9]{2}:[0-9]{2}(:[0-9]{2})?' text.file

If pattern [0-9.]{10} will not produce correct output it can be easy to change for more strong ([0-9]{2}\.){2}[0-9]{4}
If you'd like to do both task simultaneously it can be done by sed e.g.

sed -r '/[0-9.]{10},[0-9]{2}:[0-9]{2}(:[0-9]{2})?/w string.file
s/[^:]*([0-9.]{10},[0-9]{2}:[0-9]{2}(:[0-9]{2})?)/\1\n/;//P;D' text.file
5
  • I just can't get those sed commands work. Can you explain what is happening there. And those grep commands would be nice but if there is more than one time stamp per line, grep will print just once that line. Commented Aug 1, 2016 at 10:26
  • @Lesenger the problem can be in sed' version (viewable by sed --version) which don't have -r option. Do grep -Eo show all time stamps? Commented Aug 1, 2016 at 10:50
  • @Lesenger I see updated question and have to change regexp to grep -Ewo '([0-9]{1,2}-){2}([0-9]{1,2}){2}|[0-9]{1,2}(:[0-9]{2}){1,2}' Commented Aug 1, 2016 at 11:08
  • I got that sed command almost working. I didn't realize that it should be in two seperate lines. With command sed -r '/([0-9]{1,2}-){2}([0-9]{1,2}){2}|[0-9]{1,2}(:[0-9]{2}){1,2}/w outA s/[^:]*(([0-9]{1,2}-){2}([0-9]{1,2}){2}|[0-9]{1,2}(:[0-9]{2}){1,2})/\1\n/;//P;D' original right time stamps but when there is more than one time stamp in the line, command will not generate two identical lines in the output. Output is: Line 1: 8:40 routine 3, 8:45 something went south and Line 2: ` routine 3, 8:45 something went south`. This nevertheless is more than enough for my purposes. GNU sed 4.2.2 Commented Aug 2, 2016 at 6:30
  • and those grep commands would be fine if it would print line twice if there is two time stamps in the line. Commented Aug 2, 2016 at 6:51

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.