0

hope some one can help me.

Given one value per column, I succeeded in filtering a Pandas Dataframe (as shown in code below). However, depending on the analysis I am running, sometimes I would like to avoid specifying a value to filter for (for example, I would like to ignore the filtering by seg_device and filter the dataframe based on the os only).

However, the code below is forcing me to always specify some value (e.g. desktop). If I leave seg_device blank, df_ch_seg will return no data, given the condition df_ch.device == seg_device.

Would someone have any advice on how to make my code more flexible? My dataset is made of 1 million rows, per 16 columns. Below you see only 2 filters, but I have 15 in total (some of them are integers, some are strings columns). Thank you!

By looking at the code below, I would like to slightly change it so that it works in multiple occasions:

  • if I want to filter by one device (e.g. mobile)
  • if i want to filter by 2 device (e.g. mobile, desktop)
  • if I don't want to filter by device (I would like my code to ignore the filter by device)
# [...]

seg_device = input('Enter device (e.g. desktop, ...): ')
seg_os = input('Enter operating system (e.g. Mac/iOS, Windows, ...):  ')

# [...]

# Define new dataframe df_ch_seg, based on df_ch, segmented based on above input values 
df_ch_seg = df_ch[(df_ch.device == seg_device)& (df_ch.os == seg_os)]

2
  • Does this answer your question? Filter dataframe rows if value in column is in a set list of values Commented Apr 24, 2021 at 18:04
  • Hi @Deepak, unfortunately it doesn't answer. I can give you more info with about the example. In my database, I have 3 "device" values allowed to be inserted (mobile, desktop, tablet). By looking at the code above, I would like to slightly change it so that it works in multiple occasions: - if I want to filter by one device (e.g. mobile) - if i want to filter by 2 device (e.g. mobile, desktop) - if I don't want to filter by device (I would like my code to ignore the filter by device) Commented Apr 24, 2021 at 18:20

2 Answers 2

1

If I understand correctly you just want to make this in to a function where the inputs provided will be tuples of (column, filter_value).

from ast import literal_eval

def mask_constructor(filters):
    mask = []
    for (col, val) in filters:
        op = (f"df_ch[{col}] == {val}")
        mask.append(op)
    return literal_eval(" & ".join(mask))

Then you could call it like so.

mask = mask_constructor(("device", "iPhone"), ("os", "iOS"))
df_ch[mask]
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you! Unfortunately this does not solve the problem yet. It allows me to filter by 2 values using the list, but not to ignore the filtering. I edited the final part of my problem to provide more info with an example. Thank you again
yes but in this function you pass in the columns and filters you want. If you don’t want to use a specific column then don’t pass it in.
0

Maybe this code can help. Here 'a' is seg_device and 'b' is seg_os. So make sure that a='' and b=''. If you don't specify (input) 'a' then 'a' will be set to df['A'] and thereby all values in that column will be valid. The same could be done for 'b' and other columns in your Data Frame. Hope this is clear.

d = {'A':['a','b','a','b','a'], 'B':[1,2,3,4,5]}
df = pd.DataFrame(data=d)

a ='' # a is not specified and remains empty
if a=='': 
  a=df['A'] # set 'a' to all values in df['A'] 
b=1

(df['A']==a) & (df['B']>b)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.