I have a dataset with over 100,000rows and 300 columns,
Here is the sample dataset:
pd.options.display.max_colwidth = 1000
df = pd.DataFrame({'EVENT_DTL':['1. Name : John Johns \n2. Date : 05 March 2013 \n3. founded : 75075 Plano, Dallas Texas \n4. Charactor : Impersive \n5. Corona corelation : Cannot be found',
'1. Name : Mark Dwaine \n2. Date : 13 January 2020 \n3. founded : 45184 Miami, Florida \n4. Charactor : Slow learner \n5. Corona corelation : Suicide because of the economic difficulty',
'1. Name : Janny chung \n2. Date : 11 December 2011 \n3. founded : 77543 Bay area, San Fransisco \n4. Charactor : Always ambitious \n5. Corona corelation : Cannot be found but probably related to epidemic',
'1. Name : Sally \n2. Date : 11 December 2021 \n3. founded : 75074 Saginow, Fort Worth \n4. Charactor : energetic \n5. Corona corelation : Her friends guess it is because of corona'],
'EVENT_DTL_2':['He is always fast mover','He is brillient, smart','she is kind of person who is always eager to learn new subejct','he was a lunatic, his neighber said']})
df.loc[2,'EVENT_DTL_2'] = np.nan
df
I'm trying to insert 'EVENT_DTL_2' to 'EVENT_DTL' but next to the \n4. Charactor : xxx substring
The desired output is:
df2 = pd.DataFrame({'EVENT_DTL':['1. Name : John Johns \n2. Date : 05 March 2013 \n3. founded : 75075 Plano, Dallas Texas \n4. Charactor : Impersive He is always fast mover\n5. Corona corelation : Cannot be found',
'1. Name : Mark Dwaine \n2. Date : 13 January 2020 \n3. founded : 45184 Miami, Florida \n4. Charactor : Slow learner He is brillient, smart\n5. Corona corelation : Suicide because of the economic difficulty',
'1. Name : Janny chung \n2. Date : 11 December 2011 \n3. founded : 77543 Bay area, San Fransisco \n4. Charactor : Always ambitious \n5. Corona corelation : Cannot be found but probably related to epidemic',
'1. Name : Sally \n2. Date : 11 December 2021 \n3. founded : 75074 Saginow, Fort Worth \n4. Charactor : energetic he was a lunatic, his neighber said\n5. Corona corelation : Her friends guess it is because of corona'],
'EVENT_DTL_2':['He is always fast mover','He is brillient, smart',np.nan,'he was a lunatic, his neighber said']})
df2
I need a efficient way since I need to apply the method the very large dataset.