0

My question is based on this one:

Apply pandas function on column only on certain rows

But I need a function that applies to the values of severals rows in one column as if those values were a list.

For example, if I select category c1 my function should apply like this: f([3,5])


|   user  |       category    | val  | 
| ------  | ------------------| -----|
| user 1  | c1                |   3  |  
| user 1  | c2                |   4  |
| user 1  | c3                |   8  | 
| user 2  | c1                |   5  |
| user 2  | c2                |   9  | 
| user 2  | c3                |   10 |
3
  • you can groupby then apply, what function are you trying to apply? Commented May 8, 2020 at 18:05
  • We need a bit more information. Update with your desired end result. What rows are you trying to work with? What's the criteria? Commented May 8, 2020 at 18:14
  • rows I want to work with : category = c1. The function applies to a list (values in column val) and checks if there is any duplicate in the list and if the size of the list is inferior to 10. Commented May 8, 2020 at 19:12

2 Answers 2

1

I created a custom function which, given a dataframe, checks if there is any duplicate in val and if the size of val is inferior to 10, on a category of interest

df = pd.DataFrame({'user':['user 1','user 1','user 1','user 2','user 2','user 2'],
                   'category':['c1','c2','c3','c1','c2','c3'],
                   'val':[3,4,8,5,9,10]})

def custom_func(df, category):

    partial_df = df[df.category==category].copy()
    if len(partial_df.val)<10 and partial_df.val.duplicated().sum()>0:
        return True
    else:
        return False

custom_func(df, 'c1')
Sign up to request clarification or add additional context in comments.

2 Comments

partial_df = df[df.category==category].copy() partial_df.val.duplicated().sum()>0:
df[df.category==category].copy() is a filter on category of interest while df.val.duplicated().sum()>0 is a check to verify if there are duplicates in the column of interest
0

I think I managed to get it right based on @Marco Cerliani's answer.

A dataframe comes first with filter_keywords column.

This is far from being elegant...

urls = ['url1','url2']

def size_check(df, URL):
    partial_df = df[df.URL==URL].copy()
    if len(partial_df.filter_keywords)<10: 
        return True
    else:
        return False

# true there is a duplicate. false there is no duplicate
def duplicate_check(df, URL):
    partial_df = df[df.URL==URL].copy()
    if partial_df.filter_keywords.duplicated().sum()>0:
        return True
    else:
        return False

def total_check(df, URL):
    if (not duplicate_check(df, url)) and size_check(df, url):
          print(url+" ok")
    else:
        print(url+" NOT ok")      

for url in urls:    
    total_check(df, 'URL')

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.