0

I need to insert value into a column based on row index of a pandas dataframe.

import pandas as pd
df=pd.DataFrame(np.random.randint(0,100,size=(11, 4)), columns=list('ABCD'))
df['ticker']='na'
df

Sample DataFrame In the above sample dataframe, the ticker column for first 25% of the total number of records must have value '$" the next 25% of the records must have value "$$" and so on.

I tried to get the length of the dataframe and calculate 25,50,75 percent on it and then access one row at a time and assign value to "ticker" based on row index.

total_row_count=len(df)
row_25 = int(total_row_count * .25)
row_50 = int(total_row_count * .5)
row_75=int(total_row_count*.75)

if ((row.index >=0) and (row.index<=row_25)):
    return"$"
elif ((row.index > row_25) and (row.index<=row_50)):
    return"$$"
elif ((row.index > row_50) and (row.index<=row_75)):
    return"$$$"
elif (row.index > row_75):
    return"$$$$"

But I'm not able to get the row index. Please let me know if there is a different way to assign these values

4 Answers 4

1

I like to use np.select for this kind of task, because I find the syntax intuitive and readable:

# Set up your conditions:
conds = [(df.index >= 0) & (df.index <= row_25),
         (df.index > row_25) & (df.index<=row_50),
         (df.index > row_50) & (df.index<=row_75),
         (df.index > row_75)]

# Set up your target values (in the same order as your conditions)
choices = ['$', '$$', '$$$', '$$$$']

# Assign df['ticker']
df['ticker'] = np.select(conds, choices)

returns this:

>>> df
     A   B   C   D ticker
0   92  97  25  79      $
1   76   4  26  94      $
2   49  65  19  91      $
3   76   3  83  45     $$
4   83  16   0  16     $$
5    1  56  97  44     $$
6   78  17  18  86    $$$
7   55  56  83  91    $$$
8   76  16  52  33    $$$
9   55  35  80  95   $$$$
10  90  29  41  87   $$$$
Sign up to request clarification or add additional context in comments.

4 Comments

The "$$$$" wont populate in the last 2 records. Any idea why it wont populate?
try: df['ticker'] = np.select(conds, choices, default = 'test'), if the last 2 records are filled with the value test, it means that none of the conditions provided were satisfied in those rows. Otherwise, I'm not sure...
Your solution worked. I'm not sure why it does'n show in my df . When I saved it as a csv, I was able to see '$$$$'. Thanks sacul
I am trying this as I thought this would solve my problem but I get an error saying that ```` 'row_6 ' is not defined ```` (what would've been row_25 in this example). Would you happen to know a way of trouble shooting this?
1

I think cut can solve this problem

df['ticker']=pd.cut(np.arange(len(df))/len(df), [-np.inf,0.25,0.5,0.75,1], labels=["$","$$",'$$$','$$$$'],right=True)
df
Out[35]: 
     A   B   C   D ticker
0   63  51  19  33      $
1   12  80  57   1      $
2   53  27  62  26      $
3   97  43  31  80     $$
4   91  22  92  11     $$
5   39  70  82  26     $$
6   32  62  17  75    $$$
7    5  59  79  72    $$$
8   75   4  47   4    $$$
9   43   5  45  66   $$$$
10  29   9  74  94   $$$$

3 Comments

I'm not sure what I'm missing but When I ran the code its returning "$" for all the rows in ticker column.
@sow it work fine on my side , would you mind paste the code you are using here ?>
import pandas as pd import numpy as np df=pd.DataFrame(np.random.randint(0,100,size=(11, 4)), columns=list('ABCD')) df['ticker']=pd.cut(np.arange(len(df))/len(df), [-np.inf,0.25,0.5,0.75,1], labels=["$","$$",'$$$','$$$$'],right=True) df
0

You can set up a few np.where statements to handle this. Try something like the following:

import numpy as np
...
df['ticker'] = np.where(df.index < row_25, "$", df['ticker'])
df['ticker'] = np.where(row_25 <= df.index < row_50, "$$", df['ticker'])
df['ticker'] = np.where(row_50 <= df.index < row_75, "$$$", df['ticker'])
df['ticker'] = np.where(row_75 <= df.index, "$$$$", df['ticker'])

Comments

0

This is one explicit solution using .loc accessor.

import pandas as pd

df = pd.DataFrame(np.random.randint(0,100,size=(11, 4)), columns=list('ABCD'))
n = len(df.index)

df['ticker'] = 'na'
df.loc[df.index <= n/4, 'ticker'] = '$'
df.loc[(n/4 < df.index) & (df.index <= n/2), 'ticker'] = '$$'
df.loc[(n/2 < df.index) & (df.index <= n*3/4), 'ticker'] = '$$$'
df.loc[df.index > n*3/4, 'ticker'] = '$$$$'

#      A   B   C   D ticker
# 0   47  64   7  46      $
# 1   53  55  75   3      $
# 2   93  95  28  47      $
# 3   35  88  16   7     $$
# 4   99  66  88  84     $$
# 5   75   2  72  90     $$
# 6    6  53  36  92    $$$
# 7   83  58  54  67    $$$
# 8   49  83  46  54    $$$
# 9   69   9  96  73   $$$$
# 10  84  42  11  83   $$$$

4 Comments

The "$$$$" wont populate any idea on what I'm missing?
That's strange, when I try print(df) I see output as per my post.
Your solution worked. I'm not sure why it does'n show in my df . When I saved it as a csv, I was able to see '$$$$'. Thanks @jpp
@sow, no problem. Feel free to accept (tick on left) if it solved your problem.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.