1

I wanted to load the data from Azure Blob storage to Azure SQL Database using Databricks notebook . Could anyone help me in doing this

1

1 Answer 1

1

I'm new to this, so I cannot comment, but why use Databricks for this? It would be much easier and cheaper to use Azure Data Factory.

https://learn.microsoft.com/en-us/azure/data-factory/tutorial-copy-data-dot-net

If you really need to use Databricks, you would need to either mount your Blob Storage account, or access it directly from your Databricks notebook or JAR, as described in the documentation (https://docs.azuredatabricks.net/spark/latest/data-sources/azure/azure-storage.html).

You can then read the files into DataFrames for whatever format they are in, and use the SQL JDBC connector to create a connection for writing the data to SQL (https://docs.azuredatabricks.net/spark/latest/data-sources/sql-databases.html).

Sign up to request clarification or add additional context in comments.

3 Comments

One of the possible reasons (to be confirmed) to use Databricks over ADF would be the fact that ADF wants a specific table in SQL DB to be defined with schema. I am importing data from an Api in databricks and there are 200 columns. I don't want to specify a schema. I am hoping in databricks I can just create the table dynamically in SQL and infer the schema from the dataframe. It will be used as a holding table for Power BI (something like learn.microsoft.com/en-us/azure/hdinsight/spark/…)
@Rodney, that makes sense, but I'd be curious to see if you can actually get the data types to work using a dynamic schema. Type inference does not always work as you would hope, especially if your source data has a lot of null values or possible bad data. If your only reason for using SQL is as a temp table, an alternative approach you could consider is to use a hive or Delta table in Databricks for storing the data, and then query it directly from Power BI.
Yes, it's kind of a temp solution - I just had to get something quickly into the DB using the same types from my Dataframe ideally. It did work, but ultimately I will use ADF as it is a LOT faster and provides logging etc. not to mention cheaper. Just good to know there's the spark connector for those edge cases...

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.