0

So I am trying to add a new column to my dataframe that contains the side/radius given the shape and area of each row.

My original dataset looks like this:

df:

    shape     color   area  
0   square    yellow  9409.0    
1   circle    yellow  4071.5    
2   triangle  blue    2028.0    
3   square    blue    3025.0

But when I coded it like this:

df['side'] = 0
for x in df['shape']:
    if x == 'square':
        df['side'] = np.rint(np.sqrt(df['area'])).astype(int)
    elif x == 'triangle':
        df['side'] = np.rint(np.sqrt((4 * df['area'])/np.sqrt(3))).astype(int)
    elif x == 'circle':
        df['side'] = np.rint(np.sqrt(df['area']/np.pi)).astype(int)

I got:

    shape     color   area    size
0   square    yellow  9409.0  55
1   circle    yellow  4071.5  36    
2   triangle  blue    2028.0  25    
3   square    blue    3025.0  31    

It looks like the loop is adding the elif x == 'circle' clause to the side column for every row.

2
  • Your assignments are assigning to all the rows, not the current row of the for loop. Commented Mar 25, 2022 at 22:25
  • So each time through the loop, you're updating all the sides, and the final values will be based on the last value of df['shape'] Commented Mar 25, 2022 at 22:27

2 Answers 2

1

Looks like it's a good use case for numpy.select, where you select values depending on which shape it is:

import numpy as np
df['side'] = np.select([df['shape']=='square', 
                        df['shape']=='circle', 
                        df['shape']=='triangle'], 
                       [np.rint(np.sqrt(df['area'])), 
                        np.rint(np.sqrt(df['area']/np.pi)), 
                        np.rint(np.sqrt((4 * df['area'])/np.sqrt(3)))], 
                       np.nan).astype(int)

It could be written more concisely by creating a mapping from shape to multiplier; then use pandas vectorized operations:

mapping = {'square': 1, 'circle': 1 / np.pi, 'triangle': 4 / np.sqrt(3)}
df['side'] = df['shape'].map(mapping).mul(df['area']).pow(1/2).round(0).astype(int)

Output:

      shape   color    area  side
0    square  yellow  9409.0    97
1    circle  yellow  4071.5    36
2  triangle    blue  2028.0    68
3    square    blue  3025.0    55
Sign up to request clarification or add additional context in comments.

Comments

0

I see you were assigning to the columns. you can iterate over each row and edit it as you iterate over it using iterrows () method on dataFrame.

for i, row in df.iterrows():
    if row['shape'] == 'square':
        df.at[i,'side'] = np.rint(np.sqrt(row['area'])).astype(int)
    elif row['shape'] == 'triangle':
        df.at[i,'side'] = np.rint(np.sqrt((4 * row['area'])/np.sqrt(3))).astype(int)
    elif row['shape'] == 'circle':
        df.at[i,'side'] = np.rint(np.sqrt(row['area']/np.pi)).astype(int)

note the assignment is to cell of a column on row at index i.

also, suggestion by @enke above will work just fine.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.