I have dataframe with columns having some string values.
col1|col2
---------
aaa |bbb
ccc |ddd
aaa |ddd
eee |fff
I have to get number of allowed values ({aaa,ddd}) present in each columns.
cond = "`col1` = 'aaa' OR `col1` = 'ddd'"
dataframe.where(F.expr(cond)).count()
By this way we are getting required values. We are looping through all columns and perform this operation on each column.
This approach takes hours to process when number of columns increased to 2000.
Is there a better and faster approach for processing all columns parallely?