0

Let me start out by saying this, I am unsure if this is the best way to do it, but I wrote some code to create a pandas' dataframe that contains the index values from my left dataframe and one from my right dataframe where specific spatial conditions match. This is your basic spatial join, but with some additional attributes. The index values are correct.

My issue is this, how can I join the left and right dataframe together with this 3rd dataframe?

I need to support the following:

  1. If I want to keep all (from both df1 and df2), how do I do that?
  2. By default I want to keep all left dataframe values, so my join dataframe has values like: [1, None] will this be a problem?

Example:

 join_df = pd.DataFrame(data=[[0, 2], [1, 3], [2, None]], columns=['left_idx', 'right_idx'])
 df1 = pd.DataFrame([["a", {5:5}], ["b", {4:5}], ["c", {12:5}]], columns=['A1', 'A2'])
 df2 = pd.DataFrame([["b", {'a':5}], ["bbb", {'b':5}], ["ccc", {'c':5}]], columns=['B1', 'B2'])

So the join_df works like this:

  1. The data in the join_df is the index of the left dataframe (df1) and the row to join from df2 is in column 2.
  2. The join can be many to many, 1:m, or many to 1.

The goals is that all rows from df1 will be matched to all rows in df2. Optionally, (bonus question), if a match does not exist in df1 to df2, can df1's record be kept? Same with df2?

Thank you

1
  • Can you give an idea of how you'd like the output dataframe(s) to look? It's not totally clear what results you want. Commented Apr 10, 2017 at 13:46

1 Answer 1

1

You can use DataFrame.merge and match on the left column of join_df and the index of df1 or df2. Using how='left' will result in a DataFrame that only includes values specified in join_df.

join_df = join_df.merge(df1, left_on='left_idx', right_index=True, how='left')
join_df = join_df.merge(df2, left_on='right_idx', right_index=True, how='left')

This gives:

   left_idx  right_idx A1       A2   B1         B2
0         0        2.0  a   {5: 5}  ccc  {u'c': 5}
1         1        3.0  b   {4: 5}  NaN        NaN
2         2        NaN  c  {12: 5}  NaN        NaN

You can exclude the idx columns by specifying join_df[df1.columns | df2.columns]. You can use how='outer' if you want to avoid dropping values, but you may need to adjust the result to match your desired output.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.