0

I am trying to construct sql query inside databricks and this query needs to be done for every year and hence trying to add the year and table name as a variable inside the query as below.

I am trying to create 2 Temp tables for each year first as below

df_Conc_2019 = df_Conc.filter(col("Year") == 2019)
df_Conc_2019.createOrReplaceTempView('Conc_2019')

df_Conc_2020 = df_Conc.filter(col("Year") == 2020)
df_Conc_2020.createOrReplaceTempView('Conc_2020')

Then I am trying to join these tables (one by one or as per year with the main table) as below. But $Conc_Year and $Year inside my query don't work. I am trying to use for loop to achieve this as I have more than 2 years to do the analysis. Thanks.

Conc_Year = 'Conc_2019', 'Conc_2020'
Year = 2019, 2020 

query = "select A.invoice_date, A.Program_Year, $Conc_Year.BusinessSegment, $Conc_Year.Dealer_Prov, $Conc_Year.product_id... WHERE A.ID = $Conc_Year.ID AND A.Program_Year = $Year"

df_final = spark.sql(query)
0

1 Answer 1

1

Use f"{variable}" for format string in Python. For example:

for Year in [2019, 2020]:
    Conc_Year = f"Conc_{Year}"
    query = f"""
        select A.invoice_date,
               A.Program_Year,
               {Conc_Year}.BusinessSegment,
               {Conc_Year}.Dealer_Prov, 
               {Conc_Year}.product_id
        from A, {Conc_Year}
        WHERE A.ID = {Conc_Year}.ID AND A.Program_Year = {Year}
    """
Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.