I need to write a spark dataframe to Postgres DB . I have used the following
df.write
.option("numPartitions",partions)
.option("batchsize",batchsize)
.jdbc(url=url, table="table_name", mode=append, properties=properties)
This works fine however, I want to compare the performance with 'Copy' command
Tried the following
output = io.StringIO()
csv_new.write
.format("csv")
.option("header", "true")
.save(path=output)
output.seek(0)
contents = output.getvalue()
cursor.copy_from(output, 'tb_pivot_table', null="") \\using psycopg2
con_bb.commit()
This doesnot seem to work with error 'type' object is not iterable
worked well with Pandas dataframe
output= io.StringIO()
df.to_csv(path_or_buf=output,sep='\t', header=False, index=False)
output.seek(0)
contents = output.getvalue()
cursor.copy_from(output, 'tb_ts_devicedatacollection_aggregate', null="")
con_bb.commit()
Any leads on how to implement the Pandas equivalent in Pyspark. P.S: Its performance critical hence converting to spark df to Pandas df is not an option. Any help would be greatly appreciated