0

I'm loading data to SQL DB from Azure Databricks with below scenario.

  1. I have Table in Azure DB mysalesorder
  2. i have some files in ADLS which have similar data like mySalesOrder
  3. I am inserting data into Azure SQL DB using Azure databricks notebook.

No I can insert data from ADLS files to AZURE SQL DB using Azure Databricks jdbc.

but i want to know how can i prevent from duplicate row entries into the table from adls files.

1 Answer 1

1

prevent from duplicate row entries into the table from adls files.

Create a unique index or primary key on the target table. That will prevent duplicates, but won't prevent you from trying and failing to insert duplicates.

For that either load the data into a staging table and MERGE it into the target table using a JDBC statement in scala or java (not the JDBC spark connector), or turn IGNORE_DUP_KEY on the index.

Sign up to request clarification or add additional context in comments.

2 Comments

as you suggest load the data into staging. So we can merge the data using procedure in azure data factory. but can we use azure databricks for this?
Sure. See my answer here for how to call a stored procedure or arbitrary SQL statement from Spark: stackoverflow.com/questions/63065607/…

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.