Prevent from Inserting duplicate data inside azure SQL DB using azure databricks

Question

I'm loading data to SQL DB from Azure Databricks with below scenario.

I have Table in Azure DB mysalesorder
i have some files in ADLS which have similar data like mySalesOrder
I am inserting data into Azure SQL DB using Azure databricks notebook.

No I can insert data from ADLS files to AZURE SQL DB using Azure Databricks jdbc.

but i want to know how can i prevent from duplicate row entries into the table from adls files.

David Browne - Microsoft · Accepted Answer · 2022-10-09 15:50:09Z

1

prevent from duplicate row entries into the table from adls files.

Create a unique index or primary key on the target table. That will prevent duplicates, but won't prevent you from trying and failing to insert duplicates.

For that either load the data into a staging table and MERGE it into the target table using a JDBC statement in scala or java (not the JDBC spark connector), or turn IGNORE_DUP_KEY on the index.

answered Oct 9, 2022 at 15:50

David Browne - Microsoft

90.7k7 gold badges52 silver badges87 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user8205502 Over a year ago

as you suggest load the data into staging. So we can merge the data using procedure in azure data factory. but can we use azure databricks for this?

David Browne - Microsoft Over a year ago

Sure. See my answer here for how to call a stored procedure or arbitrary SQL statement from Spark: stackoverflow.com/questions/63065607/…

Collectives™ on Stack Overflow

Prevent from Inserting duplicate data inside azure SQL DB using azure databricks

1 Answer 1

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Linked

Related