1

I have a df known as df2 as shown:

Name    Age Experience  Education
Archana 35  8           Bachelors
Sharad  39  12          Bachelors
Jitesh  30  2           Diploma
Sukanya 45  18          Bachelors
Shirish 40  15          Bachelors

I want to filter data and add a column promotion which I want to set as 1 in the df as per given conditions:

  1. If education = Bachelors
  2. If experience > 10
  3. If age >30

Hence the expected df should be:

enter image description here

I know that I can use np.where for the given task but I have to convert all the columns to string type as Education column is string data type

Hence is there any faster way apart from np.where wherein I could achieve similar result without converting columns

I used

df2['prom'] = (df2['Age']>30)&(df2['experience']>10)&(df2['education' == 'Bachelors'])

But it gives me following error:

KeyError                                  Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3360             try:
-> 3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:

~\anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

~\anaconda3\lib\site-packages\pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: False

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_6476/2030827498.py in <module>
      1 #df2['ELIGIBLE_FOR_DISCOUNT'] = np.where((df2['TENURE'] >= '60') & (df2['NO_OF_FAMILY_MEMBERS'] >= '4') & (df2['EMPLOYMENT_STATUS'] =='N'), 1, 0)
      2 
----> 3 df2['ELIGIBLE_FOR_DISCOUNT'] = (df2['TENURE']>60)&(df2['NO_OF_FAMILY_MEMBERS']>3)&(df2['EMPLOYMENT_STATUS' == 'N'])
      4 
      5 

~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   3456             if self.columns.nlevels > 1:
   3457                 return self._getitem_multilevel(key)
-> 3458             indexer = self.columns.get_loc(key)
   3459             if is_integer(indexer):
   3460                 indexer = [indexer]

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   3361                 return self._engine.get_loc(casted_key)
   3362             except KeyError as err:
-> 3363                 raise KeyError(key) from err
   3364 
   3365         if is_scalar(key) and isna(key) and not self.hasnans:

KeyError: False
2
  • 1
    try: df['promotion'] = (df['Education'].eq('Bachelors') & df['Experience'].gt(10) & df['Age'].gt(30)).astype(int) Commented Feb 27, 2022 at 6:54
  • 1
    Recommend you first to check each of conditions to see which of them is producing the error. Then the error shows isna(key) is True, so I suspected that nans are the cause. Commented Feb 27, 2022 at 7:12

3 Answers 3

1

Use:

df['prom'] = (df['Age']>30)&(df['experience']>10)&(df['education' == 'Bachelors'])

if the age and experience columns are not numerical:

df['prom'] = (df['Age'].astype(int)>30)&(df['experience'].astype(int)>10)&(df['education' == 'Bachelors'])
Sign up to request clarification or add additional context in comments.

5 Comments

I am getting the error > not supported between instance of str and int
Did you use the second? This does not produce that error as we are casting first. Maybe there is some nans there. Can you provide sample date?
Yes I used the second I am getting the error as posted in the edited question. There are no nan values available in the dataset. There are values marked as "NONE" thought but no blank values or NANS
So that is the reason.
But why values written as a str(NONE) is not same as blank values or nan
1

As suggested in one of the comments use:

df['promotion'] = (df['Education'].eq('Bachelors') & df['Experience'].gt(10) & df['Age'].gt(30)).astype(int)

Comments

1

This will handle all your fallback cases.

def filter(x):
    try:
        return 1 if int(x[1]) > 30 and int(x[2]) > 10 and str(x[3]) == "Bachelors" else 0
    except:
        return 0

df["promotion"] = df.apply(filter, axis=1)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.