I am trying to use df4's LineNum column to identify the GeneralDescription in df1 by matching LineNumbers and writing to the corresponding GeneralDescription's column cell in df1. I am going for a solution that is scalable to work with data frames with thousands of rows and several other inconsequential columns. I would rather not merge if it isnt absolutely necessary. I just want to write to df1's TrueDepartment column and leave the original structure of the 2 data frames the same. Thanks –
df1
LineNum Warehouse GeneralDescription
0 2 Empty Empty
1 3 Empty Empty
2 4 PBS Empty
3 5 Empty Empty
4 6 Empty Empty
5 7 General Liability Empty
6 8 Empty Empty
7 9 Empty Empty
df4
LineNum GeneralDescription
0 4 TRUCKING
1 6 TRUCKING-GREENVILLE,TN
2 7 Human Resources
Desired result
LineNum Warehouse GeneralDescription
0 2 Empty Empty
1 3 Empty Empty
2 4 PBS TRUCKING
3 5 Empty Empty
4 6 Empty TRUCKING-GREENVILLE,TN
5 7 General Liability Human Resources
6 8 Empty Empty
7 9 Empty Empty
This is the code I have so far with packages that might be helpful. As it is I'm getting the error that says KeyError: 'the label [LineNum] is not in the [index]'
import pandas as pd
import openpyxl
import numpy as np
data= [[2,'Empty','Empty'],[3,'Empty','Empty'],[4,'PBS','Empty'],[5,'Empty','Empty'],[6,'Empty','Empty'],[7,'General Liability','Empty'],[8,'Empty','Empty'],[9,'Empty','Empty']]
df1=pd.DataFrame(data,columns=['LineNum','Warehouse','GeneralDescription'])
data4 = [[4,'TRUCKING'],[6,'TRUCKING-GREENVILLE,TN'],[7,'Human Resources']]
df4=pd.DataFrame(data4,columns=['LineNum','GeneralDescription'])
for i in range(len(df1.index)):
if df1.loc[i,'LineNum']==df4.loc['LineNum']:
df1.loc[i,'GeneralDescription']=df4.loc['GeneralDescription']
df1.merge(df4,how='left')