I would like to use a regular expression that matches any text between two strings:
sample_string= "Message ID: SM9MatRNTnMAYaylR0QgOH///qUUveBCbw==
2021-07-10T20:48:23.997Z john s (X Y Bank) -
[email protected]:
[EVENT] 347376954900491 ([email protected]) created room
(roomName='CSTest' roomDescription='CS Test Chat Room' COPY_DISABLED=false
READ_ONLY=false DISCOVERABLE=false MEMBER_ADD_USER_ENABLED=false
roomType=PRIVATE conversationScope=internal owningCompany=X Y
Bank)
Message ID: nsabNaqeXfuEj9mBEhvS0n///qUUveAhbw==
2021-07-10T20:48:23.997Z john s (X Y Bank) -
[email protected]
[EVENT] 347376954900491 ([email protected]) invited 347376954900486
([email protected]) to room (CSTest|john s|16091907435583)
Message ID: Nu/EYTkTQ5qdbqzZ0Rig8n///qUUvQ42dA==
2021-07-10T20:48:23.997Z john s (X Y Bank) -
[email protected]
Catchyou later
Message ID: dy2yaByqhm+n88Gd3VQOhH///qUUrz8odA==
2021-07-10T20:48:23.997Z kerren n (X Y Bank) -
[email protected]
KeywordContent_ Cricket is a bat-and-ball game played between two teams of
eleven players on a field at the centre of which is a 20-metre (22-yard) pitch
with a wicket at each end, each comprising two bails balanced on three stumps.
The batting side scores runs by striking the ball bowled at the wicket with
the bat, while the bowling and fielding side tries to prevent this and dismiss
each player (so they are "out").
* * *
Generated by Content Export Service | Stream Type: SymphonyPost |
Stream ID: ZZo5pRRPFC18uzlonFjya3///qUUveBHdA== | Room Type: Private |
Conversation Scope: internal | Owning Company: X Y Bank | File
Generated Date: 2021-07-10T20:48:23.997Z | Content Start Date:
2021-07-10T20:48:23.997Z | Content Stop Date: 2021-07-10T20:48:23.997Z
* * *
*** (780787) Disclaimer:
(incorporated in paris with Ref. No. ZC18, is authorised by Prudential Regulation
Authority (PRA) and regulated by Financial Conduct Authority and PRA. oyp and
its affiliates (We) monitor this confidential message meant for your
information only. We make no recommendation or offer. You should get
independent advice. We accept no liability for loss caused hereby. See market
commentary disclaimers (
http://wholesalebanking.com/en/utility/Pages/d-mkt.aspx ),
Dodd-Frank and EMIR disclosures (
http://wholesalebanking.com/en/capabilities/financialmarkets/Pages/default.aspx
) "
In this example, I would like to extract everything after emailID and keyword Messaage ID:
so expected output would be:
extracted_list =[': [EVENT] 347376954900491 ([email protected]) created room (roomName='CSTest' roomDescription='CS Test Chat Room' COPY_DISABLED=false READ_ONLY=false DISCOVERABLE=false MEMBER_ADD_USER_ENABLED=false roomType=PRIVATE conversationScope=internal owningCompany=X Y Bank)','says [EVENT] 347376954900491 ([email protected]) invited 347376954900486 ([email protected]) to room (CSTest|john s|16091907435583)','says Catchyou later','says KeywordContent_ Cricket is a bat-and-ball game played between two teams of eleven players on a field at the centre of which is a 20-metre (22-yard) pitch with a wicket at each end, each comprising two bails balanced on three stumps. The batting side scores runs by striking the ball bowled at the wicket with the bat, while the bowling and fielding side tries to prevent this and dismiss each player (so they are "out").']
Note: everything after *** at last is not the part of text
What I tried so far is:
text = re.findall(r'\S+@\S+\s+(.*)Message ID', sample_string)
print (text)
##output: []
emailIDup untilMessaage ID? Always try to provide a minimal example, not a big wall of text.emailIDsomewhere in there?(X Y Bank) -[^\s@]+@[^\s@]+\s(.*?)\bMessage ID\bregex101.com/r/zd5w8v/1 But you have to addre.DOTALLas the last parameter of re.findall