0

I have two pyspark dataframes df1 and df2

df1
       id1   id2    id3    x    y
        0     1      2    0.5  0.4
        2     1      0    0.3  0.2
        3     0      2    0.8  0.9 
        2     1      3    0.2  0.1

df2
       id     name
        0      A 
        1      B
        2      C
        3      D

I would like to join the two dataframes and have

df3
       id1   id2    id3    n1   n2   n3   x    y 
        0     1      2     A    B    C   0.5  0.4
        2     1      0     C    B    A   0.3  0.2 
        3     0      2     D    A    C   0.8  0.9
        2     1      3     C    B    D   0.2  0.1
1
  • join multiple times. Commented Sep 25, 2020 at 12:17

1 Answer 1

1

Here is the multiple joins.

df1.join(df2, df1['id1'] == df2['id'], 'left').drop('id').withColumnRenamed('name', 'n1') \
   .join(df2, df1['id2'] == df2['id'], 'left').drop('id').withColumnRenamed('name', 'n2') \
   .join(df2, df1['id3'] == df2['id'], 'left').drop('id').withColumnRenamed('name', 'n3') \
   .show()

+---+---+---+---+---+---+---+---+
|id1|id2|id3|  x|  y| n1| n2| n3|
+---+---+---+---+---+---+---+---+
|  0|  1|  2|0.5|0.4|  A|  B|  C|
|  2|  1|  0|0.3|0.2|  C|  B|  A|
|  3|  0|  2|0.8|0.9|  D|  A|  C|
|  2|  1|  3|0.2|0.1|  C|  B|  D|
+---+---+---+---+---+---+---+---+
Sign up to request clarification or add additional context in comments.

2 Comments

what about if you have other columns to in df1 that you want to keep (please see revised version)?
I have nothing changed but the result is updated. You should debug and give the info not just only say there is an error.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.