I am trying to do a df.apply on date objects but it's too too slow!!
My prun output gives....
ncalls tottime percall cumtime percall filename:lineno(function)
1999 14.563 0.007 14.563 0.007 {pandas.tslib.array_to_timedelta64}
13998 0.103 0.000 15.221 0.001 series.py:126(__init__)
9999 0.093 0.000 0.093 0.000 {method 'reduce' of 'numpy.ufunc' objects}
272012 0.093 0.000 0.125 0.000 {isinstance}
5997 0.089 0.000 0.196 0.000 common.py:199(_isnull_ndarraylike)
So basically it's 14 seconds for a 2000 length array. My actual array size is > 100,000 which translates to a run time of > 15 minutes or maybe more.
It's stupid of pandas to call this function "pandas.tslib.array_to_timedelta64" which is the bottleneck? I really don't understand why this function call is necessary??? Both the operators in subtraction are of same data types. I explicity converted them beforehand using pd.to_datetime() method. And no this conversion time is not included in this calculation.
So in all you can understand my frustration at this pathetic code!!!
actual code looks like this
df = pd.DataFrame(bet_endtimes)
def testing():
close_indices = df.apply(lambda x: np.argmin(np.abs(currentdata['date'] - x[0])),axis=1)
print close_indices
%prun testing()
gprof. If you want to know why python takes time, use this method.tottime, and that just looks like "self time". What you need to know is not self time, but inclusive time, and not as an absolute time, but as a percent, and not just of functions, but of the sites where they are called. Also, the number of samples does not need to be large. If your program takes 15 seconds when it should take less than one second, then the odds are 14:1 that a single stack sample will show you why it's taking that time.