I am trying to merge a number of csv files together. They all have a few columns in common which are:
CU_NUMBER CYCLE_DATE JOIN_NUMBER CU_NAME PhysicalAddressLine1 PhysicalAddressCity PhysicalAddressStateCode
And to the right of these columns would be various columns of interest in all of the csv files. Now, some of these csv files will have different columns of interest that I want to still merge. Also, some files may not have the same CU_NUMBER, CU_NAME, PhysicalAddressLine1, PhysicalAddressCity, PhysicalAddressStateCode.
Here is an example of what I want to do. Say I have a dataframe
and another data frame
After merging I want to have something like this:
The tricky part with this is there are various columns of interest for all the csv files and I want to see if there is a good way to merge all of them in this manner without manually specifying each column I want. I have a total of 20 csv files that I want to merge into one in this manner.
What I have so far:
I have tried something like this:
df_concat1 = pd.concat([ df13[['CU_NUMBER','CYCLE_DATE',
'JOIN_NUMBER',
'PhysicalAddressLine1','PhysicalAddressCity',
'PhysicalAddressStateCode','(CECL) Allowance for Credit Losses on Loans and Leases']]
], axis = 0)
new_df1 = df12.merge(df_concat1, how='left', on=['CU_NUMBER','CYCLE_DATE', 'JOIN_NUMBER',
'CU_NAME', 'PhysicalAddressLine1',
'PhysicalAddressCity', 'PhysicalAddressStateCode'])
But I get this error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-13-c2b139ce1777> in <module>
6 new_df1 = df12.merge(df_concat1, how='left', on=['CU_NUMBER','CYCLE_DATE', 'JOIN_NUMBER',
7 'CU_NAME', 'PhysicalAddressLine1',
----> 8 'PhysicalAddressCity', 'PhysicalAddressStateCode'])
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in merge(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
7295 copy=copy,
7296 indicator=indicator,
-> 7297 validate=validate,
7298 )
7299
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
84 copy=copy,
85 indicator=indicator,
---> 86 validate=validate,
87 )
88 return op.get_result()
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py in __init__(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
625 self.right_join_keys,
626 self.join_names,
--> 627 ) = self._get_merge_keys()
628
629 # validate the merge keys dtypes. We may need to coerce
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py in _get_merge_keys(self)
981 if not is_rkey(rk):
982 if rk is not None:
--> 983 right_keys.append(right._get_label_or_level_values(rk))
984 else:
985 # work-around for merge_asof(right_index=True)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\generic.py in _get_label_or_level_values(self, key, axis)
1690 values = self.axes[axis].get_level_values(key)._values
1691 else:
-> 1692 raise KeyError(key)
1693
1694 # Check for duplicates
KeyError: 'CU_NAME'
I am not sure why I get this error. What I want to have is merge all the columns of interest into one file and if there is columns of interest that are unique to that file alone then it will just be a new column. If there is duplicate columns then I want to just append new rows if that makes sense.



Dataframe.mergeto solve your problem?