0

Scenario: I am trying to merge 2 pandas dataframes. DF1 has the bulk data, and DF2 is a sort of mapping. Based on the combination of the values of 3 different columns, I want to put a column from DF2 into DF1.

Data sample (just a snippet of both dfs): DF1:

+-------+---------------------------------------------+---------------------------------------------+------+----------------+----------------+
| AGENT |                    5283                     |                    5288                     | 5318 |      7934      |      7935      |
+-------+---------------------------------------------+---------------------------------------------+------+----------------+----------------+
|    33 | No                                          | No                                          | No   | No             | No             |
|    34 | Yes, formal discussion                      | General reference                           | No   | No             | No             |
|    55 | No                                          | No                                          | No   | No             | No             |
|   129 | No                                          | No                                          | No   | No             | No             |
|   307 | No                                          | No                                          | No   | No             | No             |
|   441 | Yes, formal discussion                      | Formal board consideration                  | No   | No             | No             |
|   522 | Yes, specific reference but limited details | Formal board commtment with limited details | No   | Not Meaningful | Not Meaningful |
|   690 | No                                          | No                                          | No   | Not Meaningful | Not Meaningful |
|   749 | Yes, formal discussion                      | General reference                           | No   | No             | No             |
|  1011 | No                                          | No                                          | No   | No             | No             |
|  1067 | Yes, formal discussion                      | Formal board consideration                  | No   | Not Meaningful | Not Meaningful |
|  1272 | No                                          | No                                          | No   | Not Meaningful | Not Meaningful |
|  1592 | No                                          | No                                          | No   | Not Meaningful | Not Meaningful |
|  1908 | Yes, formal discussion                      | Formal board commtment with limited details | No   | No             | No             |
|  1949 | No                                          | No                                          | No   | Not Meaningful | Not Meaningful |
|  2040 | Yes, formal discussion                      | Formal board consideration                  | No   | No             | No             |
|  2102 | Yes, formal discussion                      | Formal board consideration                  | No   | No             | No             |
|  2114 | Yes, formal discussion                      | Formal board consideration                  | No   | Not Meaningful | Not Meaningful |
|  2266 | Yes, formal discussion                      | Formal board commtment with limited details | No   | No             | No             |
|  2365 | Yes, formal discussion                      | Formal board consideration                  | No   | No             | No             |
|  2674 | Yes, formal discussion                      | Formal board consideration                  | No   | No             | No             |
|  3109 | No                                          | General reference                           | No   | No             | No             |
|  3170 | Yes, specific reference but limited details | Formal board commtment with limited details | No   | Not Meaningful | Not Meaningful |
|  3295 | Yes, specific reference but limited details | Formal board commtment with limited details | No   | No             | No             |
|  3323 | General reference                           | General reference                           | No   | Not Meaningful | Not Meaningful |
|  3366 | Yes, specific reference but limited details | Formal board consideration                  | No   | No             | No             |
|  3840 | General reference                           | Formal board commtment with limited details | No   | No             | No             |
|  3914 | Yes, specific reference but limited details | Formal board commtment with limited details | No   | No             | No             |
|  3967 | Yes, formal discussion                      | Formal board consideration                  | No   | Yes            | No             |
|  4108 | No                                          | No                                          | No   | No             | No             |
+-------+---------------------------------------------+---------------------------------------------+------+----------------+----------------+

DF2:

+---------------------------------------------+---------------------------------------------+------+-------+
|                    5283                     |                    5288                     | 5318 | SCORE |
+---------------------------------------------+---------------------------------------------+------+-------+
| Yes, formal discussion                      | Formal board consideration                  | Yes  |    10 |
| Yes, formal discussion                      | Formal board consideration                  | No   |     8 |
| Yes, formal discussion                      | Formal board commtment with limited details | Yes  |     7 |
| Yes, formal discussion                      | Formal board commtment with limited details | No   |     6 |
| Yes, formal discussion                      | General reference                           | Yes  |     6 |
| Yes, formal discussion                      | General reference                           | No   |     5 |
| Yes, formal discussion                      | No specific reference                       | Yes  |     4 |
| Yes, formal discussion                      | No specific reference                       | No   |     2 |
| Yes, specific reference but limited details | Formal board consideration                  | Yes  |     8 |
| Yes, specific reference but limited details | Formal board consideration                  | No   |     7 |
| Yes, specific reference but limited details | Formal board commtment with limited details | Yes  |     6 |
| Yes, specific reference but limited details | Formal board commtment with limited details | No   |     5 |
| Yes, specific reference but limited details | General reference                           | Yes  |     5 |
| Yes, specific reference but limited details | General reference                           | No   |     5 |
| Yes, specific reference but limited details | No specific reference                       | Yes  |     3 |
| Yes, specific reference but limited details | No specific reference                       | No   |     2 |
| General reference                           | Formal board consideration                  | Yes  |     7 |
| General reference                           | Formal board consideration                  | No   |     6 |
| General reference                           | Formal board commtment with limited details | Yes  |     4 |
| General reference                           | Formal board commtment with limited details | No   |     4 |
| General reference                           | General reference                           | Yes  |     4 |
| General reference                           | General reference                           | No   |     3 |
| General reference                           | No specific reference                       | Yes  |     3 |
| General reference                           | No specific reference                       | No   |     1 |
| No reference                                | Formal board consideration                  | Yes  |     6 |
+---------------------------------------------+---------------------------------------------+------+-------+

Objective: Using columns 5283, 5288 and 5318, I need to add the Score value to DF1 as a new column.

What I tried: I tried adding a new combined column to both dfs and use it as a merge index, but still, the Score column values in DF always end up as nan:

Gov_JT_pivot['merge_key'] = tuple(zip(Gov_JT_pivot['5283'], Gov_JT_pivot['5288'], Gov_JT_pivot['5318']))
Gov_lookup_df['merge_key'] = tuple(zip(Gov_lookup_df['5283'], Gov_lookup_df['5288'], Gov_lookup_df['5318']))

# Perform the merge using the combined key
Gov_JT_pivot= pd.merge(Gov_JT_pivot, Gov_lookup_df[['merge_key', 'SCORE']], on='merge_key', how='left')

Which results in:

+-------+---------------------------------------------+---------------------------------------------+------+------------------------------------------------------------------------------------------------------+-------+
| AGENT |                    5283                     |                    5288                     | 5318 |                                              merge_key                                               | SCORE |
+-------+---------------------------------------------+---------------------------------------------+------+------------------------------------------------------------------------------------------------------+-------+
|    33 | No                                          | No                                          | No   | ('No', 'No', 'No')                                                                                   |       |
|    34 | Yes, formal discussion                      | General reference                           | No   | ('Yes, formal discussion', 'General reference', 'No')                                                |       |
|    55 | No                                          | No                                          | No   | ('No', 'No', 'No')                                                                                   |       |
|   129 | No                                          | No                                          | No   | ('No', 'No', 'No')                                                                                   |       |
|   307 | No                                          | No                                          | No   | ('No', 'No', 'No')                                                                                   |       |
|   441 | Yes, formal discussion                      | Formal board consideration                  | No   | ('Yes, formal discussion', 'Formal board consideration', 'No')                                       |       |
|   522 | Yes, specific reference but limited details | Formal board commtment with limited details | No   | ('Yes, specific reference but limited details', 'Formal board commtment with limited details', 'No') |       |
|   690 | No                                          | No                                          | No   | ('No', 'No', 'No')                                                                                   |       |
|   749 | Yes, formal discussion                      | General reference                           | No   | ('Yes, formal discussion', 'General reference', 'No')                                                |       |
|  1011 | No                                          | No                                          | No   | ('No', 'No', 'No')                                                                                   |       |
|  1067 | Yes, formal discussion                      | Formal board consideration                  | No   | ('Yes, formal discussion', 'Formal board consideration', 'No')                                       |       |
|  1272 | No                                          | No                                          | No   | ('No', 'No', 'No')                                                                                   |       |
|  1592 | No                                          | No                                          | No   | ('No', 'No', 'No')                                                                                   |       |
|  1908 | Yes, formal discussion                      | Formal board commtment with limited details | No   | ('Yes, formal discussion', 'Formal board commtment with limited details', 'No')                      |       |
|  1949 | No                                          | No                                          | No   | ('No', 'No', 'No')                                                                                   |       |
|  2040 | Yes, formal discussion                      | Formal board consideration                  | No   | ('Yes, formal discussion', 'Formal board consideration', 'No')                                       |       |
|  2102 | Yes, formal discussion                      | Formal board consideration                  | No   | ('Yes, formal discussion', 'Formal board consideration', 'No')                                       |       |
|  2114 | Yes, formal discussion                      | Formal board consideration                  | No   | ('Yes, formal discussion', 'Formal board consideration', 'No')                                       |       |
|  2266 | Yes, formal discussion                      | Formal board commtment with limited details | No   | ('Yes, formal discussion', 'Formal board commtment with limited details', 'No')                      |       |
|  2365 | Yes, formal discussion                      | Formal board consideration                  | No   | ('Yes, formal discussion', 'Formal board consideration', 'No')                                       |       |
|  2674 | Yes, formal discussion                      | Formal board consideration                  | No   | ('Yes, formal discussion', 'Formal board consideration', 'No')                                       |       |
|  3109 | No                                          | General reference                           | No   | ('No', 'General reference', 'No')                                                                    |       |
|  3170 | Yes, specific reference but limited details | Formal board commtment with limited details | No   | ('Yes, specific reference but limited details', 'Formal board commtment with limited details', 'No') |       |
|  3295 | Yes, specific reference but limited details | Formal board commtment with limited details | No   | ('Yes, specific reference but limited details', 'Formal board commtment with limited details', 'No') |       |
|  3323 | General reference                           | General reference                           | No   | ('General reference', 'General reference', 'No')                                                     |       |
|  3366 | Yes, specific reference but limited details | Formal board consideration                  | No   | ('Yes, specific reference but limited details', 'Formal board consideration', 'No')                  |       |
|  3840 | General reference                           | Formal board commtment with limited details | No   | ('General reference', 'Formal board commtment with limited details', 'No')                           |       |
|  3914 | Yes, specific reference but limited details | Formal board commtment with limited details | No   | ('Yes, specific reference but limited details', 'Formal board commtment with limited details', 'No') |       |
|  3967 | Yes, formal discussion                      | Formal board consideration                  | No   | ('Yes, formal discussion', 'Formal board consideration', 'No')                                       |       |
|  4108 | No                                          | No                                          | No   | ('No', 'No', 'No')                                                                                   |       |
|  4525 | Yes, specific reference but limited details | Formal board commtment with limited details | No   | ('Yes, specific reference but limited details', 'Formal board commtment with limited details', 'No') |       |
|  4528 | General reference                           | No                                          | No   | ('General reference', 'No', 'No')                                                                    |       |
|  4608 | Yes, formal discussion                      | Formal board consideration                  | No   | ('Yes, formal discussion', 'Formal board consideration', 'No')                                       |       |
|  4641 | No                                          | No                                          | No   | ('No', 'No', 'No')                                                                                   |       |
|  4650 | No                                          | Formal board consideration                  | No   | ('No', 'Formal board consideration', 'No')                                                           |       |
+-------+---------------------------------------------+---------------------------------------------+------+------------------------------------------------------------------------------------------------------+-------+

Question: What am I doing incorrectly and how to fix it?

4
  • Hard to say without a reproducible example. One explanation I can think of is that one of the merge columns in df2 contains extra whitespace after one of the column values. But it might be something else. Commented Sep 17 at 15:26
  • It's worth mentioning that creating a composite merge key like this is not required. Pandas supports a multi-column merge. stackoverflow.com/questions/41815079/… Commented Sep 17 at 15:27
  • Beside the point, but those column names are pretty bad. As a matter of high priority you should be renaming them to something sane. Commented Sep 17 at 15:42
  • if you would create minimal working code with example data then we could test it and see what can be wrong with your idea. Commented Sep 17 at 17:44

2 Answers 2

2

There is no need to create a merge key, you can pass to merge multiple columns to use as key:

out = df1.merge(df2, on=['5283', '5288', '5318'], how='left')

Or, easier, if you want to use all common columns:

out = df1.merge(df2, how='left')

Output:



    AGENT                                         5283                                         5288 5318            7934            7935  SCORE
0      33                                           No                                           No   No              No              No    NaN
1      34                       Yes, formal discussion                            General reference   No              No              No    5.0
2      55                                           No                                           No   No              No              No    NaN
3     129                                           No                                           No   No              No              No    NaN
4     307                                           No                                           No   No              No              No    NaN
5     441                       Yes, formal discussion                   Formal board consideration   No              No              No    8.0
6     522  Yes, specific reference but limited details  Formal board commtment with limited details   No  Not Meaningful  Not Meaningful    5.0
7     690                                           No                                           No   No  Not Meaningful  Not Meaningful    NaN
8     749                       Yes, formal discussion                            General reference   No              No              No    5.0
9    1011                                           No                                           No   No              No              No    NaN
10   1067                       Yes, formal discussion                   Formal board consideration   No  Not Meaningful  Not Meaningful    8.0
11   1272                                           No                                           No   No  Not Meaningful  Not Meaningful    NaN
12   1592                                           No                                           No   No  Not Meaningful  Not Meaningful    NaN
13   1908                       Yes, formal discussion  Formal board commtment with limited details   No              No              No    6.0
14   1949                                           No                                           No   No  Not Meaningful  Not Meaningful    NaN
15   2040                       Yes, formal discussion                   Formal board consideration   No              No              No    8.0
16   2102                       Yes, formal discussion                   Formal board consideration   No              No              No    8.0
17   2114                       Yes, formal discussion                   Formal board consideration   No  Not Meaningful  Not Meaningful    8.0
18   2266                       Yes, formal discussion  Formal board commtment with limited details   No              No              No    6.0
19   2365                       Yes, formal discussion                   Formal board consideration   No              No              No    8.0
20   2674                       Yes, formal discussion                   Formal board consideration   No              No              No    8.0
21   3109                                           No                            General reference   No              No              No    NaN
22   3170  Yes, specific reference but limited details  Formal board commtment with limited details   No  Not Meaningful  Not Meaningful    5.0
23   3295  Yes, specific reference but limited details  Formal board commtment with limited details   No              No              No    5.0
24   3323                            General reference                            General reference   No  Not Meaningful  Not Meaningful    3.0
25   3366  Yes, specific reference but limited details                   Formal board consideration   No              No              No    7.0
26   3840                            General reference  Formal board commtment with limited details   No              No              No    4.0
27   3914  Yes, specific reference but limited details  Formal board commtment with limited details   No              No              No    5.0
28   3967                       Yes, formal discussion                   Formal board consideration   No             Yes              No    8.0
29   4108                                           No                                           No   No              No              No    NaN

If you don't have any match, this might be because the values are slightly different in both dataframes, for instance leading/trailing spaces in the strings. You can check the intersection of the potential tuples of keys to identify if there are common keys:

keys = ['5283', '5288', '5318']
common = set(df1[keys].agg(tuple, axis=1)) & set(df2[keys].agg(tuple, axis=1))
Sign up to request clarification or add additional context in comments.

Comments

0

There are a few things that might be happening, but I think the main issue is duplicate keys in both dfs resulting in a many-to-many merge. Below is a step-by-step explanation with full code at the end.

Loading Data

This was for me to work though, but I am including it here because it introduces some slight variation from what your data likely looks like when loaded.

I load the given data as strings and then to a df. Do a bit of cleaning before merging to remove garbage added by the transformation from string (extra columns, and rows of nan).

from io import StringIO
import pandas as pd

data_1 = """
+-------+---------------------------------------------+---------------------------------------------+------+----------------+----------------+
| AGENT |                    5283                     |                    5288                     | 5318 |      7934      |      7935      |
+-------+---------------------------------------------+---------------------------------------------+------+----------------+----------------+
|    33 | No                                          | No                                          | No   | No             | No             |
|    34 | Yes, formal discussion                      | General reference                           | No   | No             | No             |
|    55 | No                                          | No                                          | No   | No             | No             |
|   129 | No                                          | No                                          | No   | No             | No             |
|   307 | No                                          | No                                          | No   | No             | No             |
|   441 | Yes, formal discussion                      | Formal board consideration                  | No   | No             | No             |
|   522 | Yes, specific reference but limited details | Formal board commtment with limited details | No   | Not Meaningful | Not Meaningful |
|   690 | No                                          | No                                          | No   | Not Meaningful | Not Meaningful |
|   749 | Yes, formal discussion                      | General reference                           | No   | No             | No             |
|  1011 | No                                          | No                                          | No   | No             | No             |
|  1067 | Yes, formal discussion                      | Formal board consideration                  | No   | Not Meaningful | Not Meaningful |
|  1272 | No                                          | No                                          | No   | Not Meaningful | Not Meaningful |
|  1592 | No                                          | No                                          | No   | Not Meaningful | Not Meaningful |
|  1908 | Yes, formal discussion                      | Formal board commtment with limited details | No   | No             | No             |
|  1949 | No                                          | No                                          | No   | Not Meaningful | Not Meaningful |
|  2040 | Yes, formal discussion                      | Formal board consideration                  | No   | No             | No             |
|  2102 | Yes, formal discussion                      | Formal board consideration                  | No   | No             | No             |
|  2114 | Yes, formal discussion                      | Formal board consideration                  | No   | Not Meaningful | Not Meaningful |
|  2266 | Yes, formal discussion                      | Formal board commtment with limited details | No   | No             | No             |
|  2365 | Yes, formal discussion                      | Formal board consideration                  | No   | No             | No             |
|  2674 | Yes, formal discussion                      | Formal board consideration                  | No   | No             | No             |
|  3109 | No                                          | General reference                           | No   | No             | No             |
|  3170 | Yes, specific reference but limited details | Formal board commtment with limited details | No   | Not Meaningful | Not Meaningful |
|  3295 | Yes, specific reference but limited details | Formal board commtment with limited details | No   | No             | No             |
|  3323 | General reference                           | General reference                           | No   | Not Meaningful | Not Meaningful |
|  3366 | Yes, specific reference but limited details | Formal board consideration                  | No   | No             | No             |
|  3840 | General reference                           | Formal board commtment with limited details | No   | No             | No             |
|  3914 | Yes, specific reference but limited details | Formal board commtment with limited details | No   | No             | No             |
|  3967 | Yes, formal discussion                      | Formal board consideration                  | No   | Yes            | No             |
|  4108 | No                                          | No                                          | No   | No             | No             |
+-------+---------------------------------------------+---------------------------------------------+------+----------------+----------------+
"""

data_2 = """
+---------------------------------------------+---------------------------------------------+------+-------+
|                    5283                     |                    5288                     | 5318 | SCORE |
+---------------------------------------------+---------------------------------------------+------+-------+
| Yes, formal discussion                      | Formal board consideration                  | Yes  |    10 |
| Yes, formal discussion                      | Formal board consideration                  | No   |     8 |
| Yes, formal discussion                      | Formal board commtment with limited details | Yes  |     7 |
| Yes, formal discussion                      | Formal board commtment with limited details | No   |     6 |
| Yes, formal discussion                      | General reference                           | Yes  |     6 |
| Yes, formal discussion                      | General reference                           | No   |     5 |
| Yes, formal discussion                      | No specific reference                       | Yes  |     4 |
| Yes, formal discussion                      | No specific reference                       | No   |     2 |
| Yes, specific reference but limited details | Formal board consideration                  | Yes  |     8 |
| Yes, specific reference but limited details | Formal board consideration                  | No   |     7 |
| Yes, specific reference but limited details | Formal board commtment with limited details | Yes  |     6 |
| Yes, specific reference but limited details | Formal board commtment with limited details | No   |     5 |
| Yes, specific reference but limited details | General reference                           | Yes  |     5 |
| Yes, specific reference but limited details | General reference                           | No   |     5 |
| Yes, specific reference but limited details | No specific reference                       | Yes  |     3 |
| Yes, specific reference but limited details | No specific reference                       | No   |     2 |
| General reference                           | Formal board consideration                  | Yes  |     7 |
| General reference                           | Formal board consideration                  | No   |     6 |
| General reference                           | Formal board commtment with limited details | Yes  |     4 |
| General reference                           | Formal board commtment with limited details | No   |     4 |
| General reference                           | General reference                           | Yes  |     4 |
| General reference                           | General reference                           | No   |     3 |
| General reference                           | No specific reference                       | Yes  |     3 |
| General reference                           | No specific reference                       | No   |     1 |
| No reference                                | Formal board consideration                  | Yes  |     6 |
+---------------------------------------------+---------------------------------------------+------+-------+
"""

df1 = pd.read_csv(StringIO(data_1), sep='|', skipinitialspace=True, skiprows=[1])

df1.columns = df1.columns.str.strip()
df1.columns.to_list()
# ['Unnamed: 0', 'AGENT', '5283', '5288', '5318', '7934', '7935', 'Unnamed: 7']

# Clean up columns
keep_cols = [col for col in df1.columns.to_list() if ':' not in  col]
keep_cols
# ['AGENT', '5283', '5288', '5318', '7934', '7935']

df1 = df1[keep_cols]
df1.head() # Shown below

# Repeat for df2
df2 = pd.read_csv(StringIO(data_2), sep='|', skipinitialspace=True, skiprows=[1])

df2.columns = df2.columns.str.strip()
df2.columns.to_list()

keep_cols = [col for col in df2.columns.to_list() if ':' not in  col]
keep_cols
# ['5283', '5288', '5318', 'SCORE']

df2 = df2[keep_cols]
df2.head() # Shown below

NOTE: Loading the tables from markdown intially introduced some empty row, but it should not be an issue if you have a better way to load them.

df1.head() as markdown table:

AGENT 5283 5288 5318 7934 7935
0 nan nan nan nan nan nan
1 33 No No No No No
2 34 Yes, formal discussion General reference No No No
3 55 No No No No No
4 129 No No No No No

df2.head() as markdown table:

5283 5288 5318 SCORE
0 nan nan nan nan
1 Yes, formal discussion Formal board consideration Yes 10
2 Yes, formal discussion Formal board consideration No 8
3 Yes, formal discussion Formal board commtment with limited details Yes 7
4 Yes, formal discussion Formal board commtment with limited details No 6

Duplicate Merge Keys

Given this data, part of the problem is due to a many to many merge. That is, the 'merge_key' is not unique in either df.

df1 shape and value counts for merge columns:

df1.shape
# (32, 6)

counts_1 = df1.value_counts(['5283', '5283', '5318'], dropna=False)
print(counts_1)
count
('Yes, formal discussion ', 'Yes, formal discussion ', 'No ') 12
('No ', 'No ', 'No ') 11
('Yes, specific reference but limited details ', 'Yes, specific reference but limited details ', 'No ') 5
('General reference ', 'General reference ', 'No ') 2
(nan, nan, nan) 2

This shows 5 unique combinations from the 32 initial rows.

Repeat for df2:

df2.shape
# (27, 4)

counts_2 = df2.value_counts(['5283', '5283', '5318'], dropna=False)
print(counts_2.to_markdown())
count
('General reference ', 'General reference ', 'No ') 4
('General reference ', 'General reference ', 'Yes ') 4
('Yes, formal discussion ', 'Yes, formal discussion ', 'No ') 4
('Yes, formal discussion ', 'Yes, formal discussion ', 'Yes ') 4
('Yes, specific reference but limited details ', 'Yes, specific reference but limited details ', 'No ') 4
('Yes, specific reference but limited details ', 'Yes, specific reference but limited details ', 'Yes ') 4
(nan, nan, nan) 2
('No reference ', 'No reference ', 'Yes ') 1

df2 has 8 unique combinations from the original 27 rows.

When you try to merge data like this, the result is a many:many merge. Depending on the size of your data and number of matches, this can look like all nans and will NOT achieve what you want. This merge on sample data results in 91 rows of data.

merged_df = pd.merge(df1, df2, on=['5283', '5283', '5318'], how='left')
print(merged_df.shape)
# (91, 8)

score_counts_m1 = merged_df['SCORE'].value_counts(dropna=False)
print(score_counts_m1.to_markdown())
SCORE count
5 22
2 17
nan 15
6 14
8 12
7 5
4 2
3 2
1 2

Drop Duplicates

Try dropping duplicate values from at least 1 df. Here I drop them from df2 and repeat the merge resulting in a new df with 32 rows of data, matching df1 rows as hoped for with a left merge.

df2_dedupe = df2.drop_duplicates(subset=['5283', '5283', '5318'])
df2_dedupe.shape
# (8, 4)

merged_df2 = pd.merge(df1, df2_dedupe, on=['5283', '5283', '5318'], how='left')
print(merged_df2.shape)
# (32, 8)

score_counts_m2 = merged_df2['SCORE'].value_counts(dropna=False)
print(score_counts_m2.to_markdown())
SCORE count
nan 13
8 12
7 5
6 2

NOTE: The current nans might have been introduced by loading from a string.

These results are more in line with your expectations, I think.

Full Code

from io import StringIO
import pandas as pd

data_1 = """
+-------+---------------------------------------------+---------------------------------------------+------+----------------+----------------+
| AGENT |                    5283                     |                    5288                     | 5318 |      7934      |      7935      |
+-------+---------------------------------------------+---------------------------------------------+------+----------------+----------------+
|    33 | No                                          | No                                          | No   | No             | No             |
|    34 | Yes, formal discussion                      | General reference                           | No   | No             | No             |
|    55 | No                                          | No                                          | No   | No             | No             |
|   129 | No                                          | No                                          | No   | No             | No             |
|   307 | No                                          | No                                          | No   | No             | No             |
|   441 | Yes, formal discussion                      | Formal board consideration                  | No   | No             | No             |
|   522 | Yes, specific reference but limited details | Formal board commtment with limited details | No   | Not Meaningful | Not Meaningful |
|   690 | No                                          | No                                          | No   | Not Meaningful | Not Meaningful |
|   749 | Yes, formal discussion                      | General reference                           | No   | No             | No             |
|  1011 | No                                          | No                                          | No   | No             | No             |
|  1067 | Yes, formal discussion                      | Formal board consideration                  | No   | Not Meaningful | Not Meaningful |
|  1272 | No                                          | No                                          | No   | Not Meaningful | Not Meaningful |
|  1592 | No                                          | No                                          | No   | Not Meaningful | Not Meaningful |
|  1908 | Yes, formal discussion                      | Formal board commtment with limited details | No   | No             | No             |
|  1949 | No                                          | No                                          | No   | Not Meaningful | Not Meaningful |
|  2040 | Yes, formal discussion                      | Formal board consideration                  | No   | No             | No             |
|  2102 | Yes, formal discussion                      | Formal board consideration                  | No   | No             | No             |
|  2114 | Yes, formal discussion                      | Formal board consideration                  | No   | Not Meaningful | Not Meaningful |
|  2266 | Yes, formal discussion                      | Formal board commtment with limited details | No   | No             | No             |
|  2365 | Yes, formal discussion                      | Formal board consideration                  | No   | No             | No             |
|  2674 | Yes, formal discussion                      | Formal board consideration                  | No   | No             | No             |
|  3109 | No                                          | General reference                           | No   | No             | No             |
|  3170 | Yes, specific reference but limited details | Formal board commtment with limited details | No   | Not Meaningful | Not Meaningful |
|  3295 | Yes, specific reference but limited details | Formal board commtment with limited details | No   | No             | No             |
|  3323 | General reference                           | General reference                           | No   | Not Meaningful | Not Meaningful |
|  3366 | Yes, specific reference but limited details | Formal board consideration                  | No   | No             | No             |
|  3840 | General reference                           | Formal board commtment with limited details | No   | No             | No             |
|  3914 | Yes, specific reference but limited details | Formal board commtment with limited details | No   | No             | No             |
|  3967 | Yes, formal discussion                      | Formal board consideration                  | No   | Yes            | No             |
|  4108 | No                                          | No                                          | No   | No             | No             |
+-------+---------------------------------------------+---------------------------------------------+------+----------------+----------------+
"""

data_2 = """
+---------------------------------------------+---------------------------------------------+------+-------+
|                    5283                     |                    5288                     | 5318 | SCORE |
+---------------------------------------------+---------------------------------------------+------+-------+
| Yes, formal discussion                      | Formal board consideration                  | Yes  |    10 |
| Yes, formal discussion                      | Formal board consideration                  | No   |     8 |
| Yes, formal discussion                      | Formal board commtment with limited details | Yes  |     7 |
| Yes, formal discussion                      | Formal board commtment with limited details | No   |     6 |
| Yes, formal discussion                      | General reference                           | Yes  |     6 |
| Yes, formal discussion                      | General reference                           | No   |     5 |
| Yes, formal discussion                      | No specific reference                       | Yes  |     4 |
| Yes, formal discussion                      | No specific reference                       | No   |     2 |
| Yes, specific reference but limited details | Formal board consideration                  | Yes  |     8 |
| Yes, specific reference but limited details | Formal board consideration                  | No   |     7 |
| Yes, specific reference but limited details | Formal board commtment with limited details | Yes  |     6 |
| Yes, specific reference but limited details | Formal board commtment with limited details | No   |     5 |
| Yes, specific reference but limited details | General reference                           | Yes  |     5 |
| Yes, specific reference but limited details | General reference                           | No   |     5 |
| Yes, specific reference but limited details | No specific reference                       | Yes  |     3 |
| Yes, specific reference but limited details | No specific reference                       | No   |     2 |
| General reference                           | Formal board consideration                  | Yes  |     7 |
| General reference                           | Formal board consideration                  | No   |     6 |
| General reference                           | Formal board commtment with limited details | Yes  |     4 |
| General reference                           | Formal board commtment with limited details | No   |     4 |
| General reference                           | General reference                           | Yes  |     4 |
| General reference                           | General reference                           | No   |     3 |
| General reference                           | No specific reference                       | Yes  |     3 |
| General reference                           | No specific reference                       | No   |     1 |
| No reference                                | Formal board consideration                  | Yes  |     6 |
+---------------------------------------------+---------------------------------------------+------+-------+
"""

df1 = pd.read_csv(StringIO(data_1), sep='|', skipinitialspace=True, skiprows=[1])

df1.columns = df1.columns.str.strip()
df1.columns.to_list()
# ['Unnamed: 0', 'AGENT', '5283', '5288', '5318', '7934', '7935', 'Unnamed: 7']

# Clean up columns
keep_cols = [col for col in df1.columns.to_list() if ':' not in  col]
keep_cols
# ['AGENT', '5283', '5288', '5318', '7934', '7935']

df1 = df1[keep_cols]
df1.head() 

# Repeat for df2
df2 = pd.read_csv(StringIO(data_2), sep='|', skipinitialspace=True, skiprows=[1])

df2.columns = df2.columns.str.strip()
df2.columns.to_list()

keep_cols = [col for col in df2.columns.to_list() if ':' not in  col]
keep_cols
# ['5283', '5288', '5318', 'SCORE']

df2 = df2[keep_cols]
df2.head() 

df1.shape
# (32, 6)

counts_1 = df1.value_counts(['5283', '5283', '5318'], dropna=False)
print(counts_1)

df2.shape
# (27, 4)

counts_2 = df2.value_counts(['5283', '5283', '5318'], dropna=False)
print(counts_2.to_markdown())

df2_dedupe = df2.drop_duplicates(subset=['5283', '5283', '5318'])
df2_dedupe.shape
# (8, 4)

merged_df2 = pd.merge(df1, df2_dedupe, on=['5283', '5283', '5318'], how='left')
print(merged_df2.shape)
# (32, 8)

score_counts_m2 = merged_df2['SCORE'].value_counts(dropna=False)
print(score_counts_m2.to_markdown())

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.