pandas apply function to multiple columns and multiple rows

Question

I have a dataframe with consecutive pixel coordinates in rows and columns 'xpos', 'ypos', and I want to calculate the angle in degrees of each path between consecutive pixels. Currently I have the solution presented below, which works fine and for teh size of my file is speedy enough, but iterating through all the rows seems not to be the pandas way to do it. I know how to apply a function to different columns, and how to apply functions to different rows of columns, but can't figure out how to combine both.

here's my code:

fix_df = pd.read_csv('fixations_out.csv')

# wyliczanie kąta sakady
temp_list=[]
for count, row in df.iterrows():
    x1 = row['xpos']
    y1 = row['ypos']
    try:
        x2 = df['xpos'].ix[count-1]
        y2 = df['ypos'].ix[count-1]
        a = abs(180/math.pi * math.atan((y2-y1)/(x2-x1)))
        temp_list.append(a)
    except KeyError:
        temp_list.append(np.nan)

and then I insert temp list into df

EDIT: after implementing the tip from the comment I have:

df['diff_x'] = df['xpos'].shift() - df['xpos']
df['diff_y'] = df['ypos'].shift() - df['ypos']

def calc_angle(x):
    try:
        a = abs(180/math.pi * math.atan((x.diff_y)/(x.diff_x)))
        return a
    except ZeroDivisionError:
        return 0

df['angle_degrees'] = df.apply(calc_angle, axis=1)

I compared the time of three solutions for my df (the size of the df is about 6k rows), the iteration is almost 9 times slower than apply, and about 1500 times slower then doing it without apply:

execution time of the solution with iteration, including insert of a new column back to df: 1,51s

execution time of the solution without iteration, with apply: 0.17s

execution time of accepted answer by EdChum using diff(), without iteration and without apply: 0.001s

Suggestion: do not use iteration or apply and always try to use vectorized calculation ;) it is not only faster, but also more readable.

As a start you can calculate the difference as df['xpos'].shift() - df['xpos'] rather than doing this row-wise, then you can calculate the angle using your function on the whole column — EdChum
– EdChum, Commented Jun 13, 2014 at 9:40
I've updated my answer I get less than 1ms performance which is many orders of magnitude quicker — EdChum
– EdChum, Commented Jun 13, 2014 at 10:24

EdChum · Accepted Answer · 2014-06-13 10:24:22Z

You can do this via the following method and I compared the pandas way to your way and it is over 1000 times faster, and that is without adding the list back as a new column! This was done on a 10000 row dataframe

In [108]:

%%timeit
import numpy as np
df['angle'] = np.abs(180/math.pi * np.arctan(df['xpos'].shift() - df['xpos']/df['ypos'].shift() - df['ypos']))

1000 loops, best of 3: 1.27 ms per loop

In [100]:

%%timeit
temp_list=[]
for count, row in df.iterrows():
    x1 = row['xpos']
    y1 = row['ypos']
    try:
        x2 = df['xpos'].ix[count-1]
        y2 = df['ypos'].ix[count-1]
        a = abs(180/math.pi * math.atan((y2-y1)/(x2-x1)))
        temp_list.append(a)
    except KeyError:
        temp_list.append(np.nan)
1 loops, best of 3: 1.29 s per loop

Also if possible avoid using apply, as this operates row-wise, if you can find a vectorised method that can work on the entire series or dataframe then always prefer this.

UPDATE

seeing as you are just doing a subtraction from the previous row there is built in method for this diff this results in even faster code:

In [117]:

%%timeit
import numpy as np
df['angle'] = np.abs(180/math.pi * np.arctan(df['xpos'].diff(1)/df['ypos'].diff(1)))

1000 loops, best of 3: 1.01 ms per loop

Another update

There is also a build in method for series and dataframe division, this now shaves more time off and I achieve sub 1ms time:

In [9]:

%%timeit
import numpy as np
df['angle'] = np.abs(180/math.pi * np.arctan(df['xpos'].diff(1).div(df['ypos'].diff(1))))

1000 loops, best of 3: 951 µs per loop

@joris, yes for consistency but it made little difference 1.27ms versus 1.29 ms, I'll update the answer though, thanks

Collectives™ on Stack Overflow

pandas apply function to multiple columns and multiple rows

1 Answer 1

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Linked

Related