14

I have a data frame in pyspark. In this data frame I have column called id that is unique.

Now I want to find the maximum value of the column id in the data frame.

I have tried like below

df['id'].max()

But got below error

TypeError: 'Column' object is not callable

Please let me know how to find the maximum value of a column in data frame

In the answer by @Dadep the link gives the correct answer

5
  • How do you create your data frame ? do you use pandas ? Commented May 11, 2017 at 20:23
  • This looks like a spark dataframe to me. Perhaps, you need to add a spark or pyspark tag or both to your question. Commented May 11, 2017 at 20:23
  • 2
    Please try to share a minimal reproducible example. The best I can say is: it should work if all what you said is true. Commented May 11, 2017 at 20:25
  • @Abdou yes it is a spark dataframe Commented May 11, 2017 at 22:14
  • @Dadep No I don't use pandas Commented May 11, 2017 at 22:15

4 Answers 4

23

if you are using pandas .max() will work :

>>> df2=pd.DataFrame({'A':[1,5,0], 'B':[3, 5, 6]})
>>> df2['A'].max()
5

Else if it's a spark dataframe:

Best way to get the max value in a Spark dataframe column

Sign up to request clarification or add additional context in comments.

1 Comment

I had to run with df2.A.max() to make it work ... if its a help to anyone else
2

I'm coming from scala, but I do believe that this is also applicable on python.

val max = df.select(max("id")).first()

but you have first import the following :

from pyspark.sql.functions import max

Comments

1

The following can be used in pyspark:

df.select(max("id")).show()

Comments

1

You can use the aggregate max as also mentioned in the pyspark documentation link below:

Link : https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=agg

Code:

row1 = df1.agg({"id": "max"}).collect()[0]

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.