Pandas dataframe apply function to the values of several rows as a list

Question

My question is based on this one:

Apply pandas function on column only on certain rows

But I need a function that applies to the values of severals rows in one column as if those values were a list.

For example, if I select category c1 my function should apply like this: f([3,5])


|   user  |       category    | val  | 
| ------  | ------------------| -----|
| user 1  | c1                |   3  |  
| user 1  | c2                |   4  |
| user 1  | c3                |   8  | 
| user 2  | c1                |   5  |
| user 2  | c2                |   9  | 
| user 2  | c3                |   10 |

you can groupby then apply, what function are you trying to apply? — Umar.H
– Umar.H, Commented May 8, 2020 at 18:05
We need a bit more information. Update with your desired end result. What rows are you trying to work with? What's the criteria? — Rexovas
– Rexovas, Commented May 8, 2020 at 18:14
rows I want to work with : category = c1. The function applies to a list (values in column val) and checks if there is any duplicate in the list and if the size of the list is inferior to 10. — Zumplo
– Zumplo, Commented May 8, 2020 at 19:12

Marco Cerliani · Accepted Answer · 2020-05-08 23:28:50Z

1

I created a custom function which, given a dataframe, checks if there is any duplicate in val and if the size of val is inferior to 10, on a category of interest

df = pd.DataFrame({'user':['user 1','user 1','user 1','user 2','user 2','user 2'],
                   'category':['c1','c2','c3','c1','c2','c3'],
                   'val':[3,4,8,5,9,10]})

def custom_func(df, category):

    partial_df = df[df.category==category].copy()
    if len(partial_df.val)<10 and partial_df.val.duplicated().sum()>0:
        return True
    else:
        return False

custom_func(df, 'c1')

answered May 8, 2020 at 23:28

Marco Cerliani

22.1k3 gold badges58 silver badges60 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Zumplo Over a year ago

partial_df = df[df.category==category].copy() partial_df.val.duplicated().sum()>0:

Marco Cerliani Over a year ago

df[df.category==category].copy() is a filter on category of interest while df.val.duplicated().sum()>0 is a check to verify if there are duplicates in the column of interest

Zumplo · Accepted Answer · 2020-05-10 11:41:51Z

I think I managed to get it right based on @Marco Cerliani's answer.

A dataframe comes first with filter_keywords column.

This is far from being elegant...

urls = ['url1','url2']

def size_check(df, URL):
    partial_df = df[df.URL==URL].copy()
    if len(partial_df.filter_keywords)<10: 
        return True
    else:
        return False

# true there is a duplicate. false there is no duplicate
def duplicate_check(df, URL):
    partial_df = df[df.URL==URL].copy()
    if partial_df.filter_keywords.duplicated().sum()>0:
        return True
    else:
        return False

def total_check(df, URL):
    if (not duplicate_check(df, url)) and size_check(df, url):
          print(url+" ok")
    else:
        print(url+" NOT ok")      

for url in urls:    
    total_check(df, 'URL')

Collectives™ on Stack Overflow

Pandas dataframe apply function to the values of several rows as a list

2 Answers 2

2 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Linked

Related