Background
I have the following sample df that contains PHYSICIAN in the Text column followed by the physician name (all names below are made up)
import pandas as pd
df = pd.DataFrame({'Text' : ['PHYSICIAN: Jon J Smith was here today',
'And Mary Lisa Rider found here',
'Her PHYSICIAN: Jane A Doe is also here',
' She was seen by PHYSICIAN: Tom Tucker '],
'P_ID': [1,2,3,4],
'N_ID' : ['A1', 'A2', 'A3', 'A4']
})
#rearrange columns
df = df[['Text','N_ID', 'P_ID']]
df
Text N_ID P_ID
0 PHYSICIAN: Jon J Smith was here today A1 1
1 And Mary Lisa Rider found here A2 2
2 Her PHYSICIAN: Jane A Doe is also here A3 3
3 She was seen by PHYSICIAN: Tom Tucker A4 4
Goal
1) Replace the names that follow the word PHYSICIAN (e.g. PHYSICIAN: Jon J Smith) with PHYSICIAN: **BLOCK**
2) Create a new column named Text_Phys
Desired Output
Text N_ID P_ID Text_Phys
0 PHYSICIAN: Jon J Smith was here today A1 1 PHYSICIAN: **BLOCK** was here today
1 And Mary Lisa Rider found here A2 2 And Mary Lisa Rider found here
2 Her PHYSICIAN: Jane A Doe is also here A3 3 Her PHYSICIAN: **BLOCK** is also here
3 She was seen by PHYSICIAN: Tom Tucker A4 4 She was seen by PHYSICIAN: **BLOCK**
I have tried the following
1) df['Text_Phys'] = df['Text'].replace(r'ABC.*', 'ABC: ***BLOCK***', regex=True)
2) df['Text_Phys'] = df['Text'].replace(r'ABC\s+', 'ABC: ***BLOCK***', regex=True)
But they don't seem to quite work
Question
How do I achieve my desired output?

df['Text'] = df['Text'].replace(r'PHYSICIAN', 'PHYSICIAN: ***PHI***', regex=True)anddf['Text'] = df['Text'].replace(r'Physician', 'Physician: ***PHI***', regex=True)import rethendf['Text_Phys'] = df['Text'].str.replace('PHYSICIAN', 'PHYSICIAN: ***PHI***', flags=re.I)but it will make the case in upper. However, earlier works fine for me, What version of pandas you are using.PHYSICIAN:. However, It is almost impossible to identifyJon J Smith,Jane A Doe, andTom Tuckerwithin the subtring. How do you know they are the names to replace unless you have some rules to identify them?