2

I have two pyspark dataframes:

|  A  |  B  |  C  |
| 21  | 999 | 1000|
| 22  | 786 | 1978|
| 23  | 345 | 1563|

and

|  A  |  D  |  E  |
| 21  | aaa | a12 |
| 22  | bbb | b43 |
| 23  | ccc | h67 |

Desired result:

|  A  |  B  |  C  |  E  |
| 21  | 999 | 1000| a12 |
| 22  | 786 | 1978| b43 |
| 23  | 345 | 1563| h67 |

I tried using join, even df1.join(df2.E, df1.A == df2.A) to no avail.

2

2 Answers 2

3

I think this code does what you want:

joinedDF = df1.join(df2.select('A', 'E'), ['A'])
Sign up to request clarification or add additional context in comments.

Comments

3

When you are trying to join the 2 dataframe using the function join it takes 3 arguments.

  1. arg-1 : another dataframe which you need to join.
  2. arg-2 : columns based on which you need to join the dataframes.
  3. arg-3 : Type of join you want to perform. by default its inner join.

PFB sample code.

df1.join(df2, df1.id == df2.id, 'outer')

You can find more details here.

Regards,

Neeraj

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.