1

I use Databricks runtime 6.3 and use pySpark. I have a dataframe df_1. SalesVolume is an integer but AveragePrice is a string.

When I execute below code, code runs and I get the correct output.

display(df_1.filter('SalesVolume>10000 and AveragePrice>70000'))

But, below code ends up in error; "py4j.Py4JException: Method and([class java.lang.Integer]) does not exist"

display(df_1.filter(df_1['SalesVolume']>10000 & df_1['AveragePrice']>7000))

Why does the first one work but not the second one?

1
  • I believe you need to put the conditions in braces if you're using multiple conditions. Commented Jan 26, 2020 at 18:57

1 Answer 1

2

you have to wrap your conditions in ()

display(df_1.filter((df_1['SalesVolume']>10000) & (df_1['AveragePrice']>7000)))

Filter accepts SQL like syntax or dataframe like syntax, 1st one works because it's a valid sql like syntax. but second one isn't.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.