Skip to main content
7 of 7
deleted 8 characters in body
user68650
  • 343
  • 8
  • 19

Multiple line pattern/ Data extraction

I have the following header below in the about 100,000 files, I have already extracted each line separately and combined each record in excel, so my time crunch is over and i am now looking to for an expedient method of data extraction.

X-RSMF-Generator: RSMF Generator Sample Library

X-RSMF-Version: 1.0.0

X-RSMF-EventCount: 53

X-RSMF-BeginDate: 2022-09-20T04:33:11-04:00

X-RSMF-EndDate: 2022-09-20T16:47:56-04:00

X-RSMF-GroupID: GRP000000118

X-RSMF-SecondaryGroupID: GRP000000118_D_20220920

X-RSMF-ContainsDeleted: False

X-RSMF-Application: Native Messages

X-RSMF-Participants: Person One <5156242756> Person two, Person

three [email protected] <21243210277> Person four *** <345278652345>

MIME-Version: 1.0

Not all lines are present in all files and the last field can contain more than one line. MIME-Version: 1.0 - I think we can use MIME-Version: 1.0 as the stop. I also only need to data for each line entry. everything before the ": " (colon space) can be ignored as those are the field headings.

I started out using sed, thinking I could just concatenate each line and pipe to AWK. to make each column.

#!/bin/sh

shopt -s nullglob
FILES=/mnt/c/Temp/rsmf/*.rsmf

for f in $FILES

do
    #echo "Processing $f"
    sed -rn \
    -e '/^X-RSMF-BeginDate:/{
        s/X-RSMF-BeginDate: //
        s/T/ /
        s/-0[45]:00/ /
        s/X-RSMF-Application://
        h
        #p
        }' \
    -e '/^X-RSMF-EndDate:/{
        s/X-RSMF-EndDate: //
        s/T/ /
        s/-0[45]:00/ /
        H
        #p
        }' \
     -e '/^X-RSMF-GroupID:/{
        s/X-RSMF-GroupID: //
        H
        x
        s/\r\n//gp
        }' \
         $f
done

Results -

2022-10-05 12:54:27 2022-10-05 12:54:27 GRP000000001
2022-10-05 11:48:18 2022-10-05 11:48:18 GRP000000002

Before spending time on this, I wanted to seek recommendations on the best approach and practice for this particular project.

Thoughts??

user68650
  • 343
  • 8
  • 19