31

I am writing spark code in python. How do I pass a variable in a spark.sql query?

    q25 = 500
    Q1 = spark.sql("SELECT col1 from table where col2>500 limit $q25 , 1")

Currently the above code does not work? How do we pass variables?

I have also tried,

    Q1 = spark.sql("SELECT col1 from table where col2>500 limit q25='{}' , 1".format(q25))
3

5 Answers 5

24

You need to remove single quote and q25 in string formatting like this:

Q1 = spark.sql("SELECT col1 from table where col2>500 limit {}, 1".format(q25))

Update:

Based on your new queries:

spark.sql("SELECT col1 from table where col2>500 order by col1 desc limit {}, 1".format(q25))

Note that the SparkSQL does not support OFFSET, so the query cannot work.

If you need add multiple variables you can try this way:

q25 = 500
var2 = 50
Q1 = spark.sql("SELECT col1 from table where col2>{0} limit {1}".format(var2,q25))
Sign up to request clarification or add additional context in comments.

6 Comments

This is still giving me mismatched input exception : spark.sql(SELECT col1 from table where col2>500 order by col1 desc limit {}, 1".format(q25))
mismatched input for ','
before SELECT, you need double quotes.
I have used " Looks llike the query does not work with ,1 i.e OFFSET is not supported in spark sql. any work arounds?
@Viv yes, you are right, SparkSQL does not support OFFSET. you can refer to this answer stackoverflow.com/questions/42560815/…
|
9

Using f-Strings approach (PySpark):

table = 'my_schema.my_table'

df = spark.sql(f'select * from {table}')

Comments

5

Another option if you're doing this sort of thing often or want to make your code easier to re-use is to use a map of configuration variables and the format option:

configs = {"q25":10,
           "TABLE_NAME":"my_table",
           "SCHEMA":"my_schema"}
Q1 = spark.sql("""SELECT col1 from {SCHEMA}.{TABLE_NAME} 
                  where col2>500 
                  limit {q25}
               """.format(**configs))

2 Comments

this looks great. however, it doesn't work as it says name: 'configs' is not defined. What am I missing? @David Maddox
found the mistake , if you're using jupyter notebook you have to write the dictionary in the same cell as well
2

A really easy solution is to store the query as a string (using the usual python formatting), and then pass it to the spark.sql() function:

q25 = 500

query = "SELECT col1 from table where col2>500 limit {}".format(q25)

Q1 = spark.sql(query)

Comments

0

All you need to do is add s (String interpolator) to the string. This allows the usage of variable directly into the string.

val q25 = 10
Q1 = spark.sql(s"SELECT col1 from table where col2>500 limit $q25)

2 Comments

The solution you have provided is for Python or some other language? It seems off-beat...
This appears to be the scala implementation.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.