Skip to main content
edited tags
Link
ZygD
  • 24.8k
  • 41
  • 106
  • 144
added 113 characters in body
Source Link
emax
  • 7.3k
  • 23
  • 88
  • 155

I have two pyspark dataframes df1 and df2

df1
       id1   id2    id3    x    y
        0     1      2    0.5  0.4
        2     1      0    0.3  0.2
        3     0      2    0.8  0.9 
        2     1      3    0.2  0.1

df2
       id     name
        0      A 
        1      B
        2      C
        3      D

I would like to join the two dataframes and have

df3
       id1   id2    id3    n1   n2   n3   x    y 
        0     1      2     A    B    C   0.5  0.4
        2     1      0     C    B    A   0.3  0.2 
        3     0      2     D    A    C   0.8  0.9
        2     1      3     C    B    D   0.2  0.1

I have two pyspark dataframes df1 and df2

df1
       id1   id2    id3
        0     1      2
        2     1      0
        3     0      2
        2     1      3

df2
       id     name
        0      A 
        1      B
        2      C
        3      D

I would like to join the two dataframes and have

df3
       id1   id2    id3    n1   n2   n3
        0     1      2     A    B    C
        2     1      0     C    B    A 
        3     0      2     D    A    C
        2     1      3     C    B    D

I have two pyspark dataframes df1 and df2

df1
       id1   id2    id3    x    y
        0     1      2    0.5  0.4
        2     1      0    0.3  0.2
        3     0      2    0.8  0.9 
        2     1      3    0.2  0.1

df2
       id     name
        0      A 
        1      B
        2      C
        3      D

I would like to join the two dataframes and have

df3
       id1   id2    id3    n1   n2   n3   x    y 
        0     1      2     A    B    C   0.5  0.4
        2     1      0     C    B    A   0.3  0.2 
        3     0      2     D    A    C   0.8  0.9
        2     1      3     C    B    D   0.2  0.1
Source Link
emax
  • 7.3k
  • 23
  • 88
  • 155

Pyspark: how to join two dataframes over multiple columns?

I have two pyspark dataframes df1 and df2

df1
       id1   id2    id3
        0     1      2
        2     1      0
        3     0      2
        2     1      3

df2
       id     name
        0      A 
        1      B
        2      C
        3      D

I would like to join the two dataframes and have

df3
       id1   id2    id3    n1   n2   n3
        0     1      2     A    B    C
        2     1      0     C    B    A 
        3     0      2     D    A    C
        2     1      3     C    B    D