1

I have a data frame where I want to add a new column with values based on the index.

This is my fake df:

{'fruit': [
'Apple', 'Kiwi', 'Clementine', 'Kiwi', 'Banana', 'Clementine', 'Apple', 'Kiwi'],
'bites': [1, 2, 3, 1, 2, 3, 1, 2]})

I have found a similar question and tried the solution there but I get error messages. This is what I tried:

conds = [(my.index >= 0) & (my.index <= row_2),
         (my.index > row_2) & (my.index<=row_5),
         (my.index > row_5) & (my.index<=row_6),
         (my.index > row_6)]


names = ['Donna', 'Kelly', 'Andrea','Brenda']


my['names'] = np.select(conds, names)
2
  • What are row_2, row_5...? what's the error message that you got? Commented May 23, 2019 at 13:25
  • @QuangHoang I might have missed how I define the rows, the help I took was from this post. The error message is row_2 is not defined which makes me feel stupid since apparently that's the same question you're asking... Commented May 23, 2019 at 13:29

2 Answers 2

2

For me it working nice (variables changed to numeric), also added alternative solutions with cut with include_lowest=True parameter for match 0 value and selecting by DataFrame.loc:

conds = [(my.index >= 0) & (my.index <= 2),
         (my.index > 2) & (my.index<=5),
         (my.index > 5) & (my.index<=6),
         (my.index > 6)]


names = ['Donna', 'Kelly', 'Andrea','Brenda']


my['names'] = np.select(conds, names)
my['names1'] = pd.cut(my.index, [0,2,5,6,np.inf], labels=names, include_lowest=True)

my.loc[:2, 'names2'] = 'Donna'
my.loc[3:5, 'names2'] = 'Kelly'
my.loc[6:7, 'names2'] = 'Andrea'
my.loc[7:, 'names2'] = 'Brenda'

print (my)
        fruit  bites   names  names1  names2
0       Apple      1   Donna   Donna   Donna
1        Kiwi      2   Donna   Donna   Donna
2  Clementine      3   Donna   Donna   Donna
3        Kiwi      1   Kelly   Kelly   Kelly
4      Banana      2   Kelly   Kelly   Kelly
5  Clementine      3   Kelly   Kelly   Kelly
6       Apple      1  Andrea  Andrea  Andrea
7        Kiwi      2  Brenda  Brenda  Brenda
Sign up to request clarification or add additional context in comments.

1 Comment

Great! I feel kind of stupid, of course I should've figured that I had to use the row index number....
2

You can try pd.cut:

df['names'] = (pd.cut(df.index, 
                      [0, 2, 5, 6, np.inf], 
                      labels=names)
                 .fillna(names[0])
              )

1 Comment

Thanks! a lot less to write, which to me is better. :)

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.