Skip to content

Conversation

@chris-b1
Copy link
Contributor

@chris-b1 chris-b1 commented Oct 8, 2015

This is a WIP, but far enough along I thought I'd share and see if the approach was reasonable.

This releases the GIL on most vectorized field accessors (e.g. dt.year) and conversion to and from Period. May be places it could be done - obviously would be nice for parsing, but I'm not sure that's possible.

@jreback
Copy link
Contributor

jreback commented Oct 8, 2015

ohh nice!

can u share some timings?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move nogil outside the loop

@chris-b1
Copy link
Contributor Author

chris-b1 commented Oct 8, 2015

Here are some timings - getting a pretty nice speedup. In single-threaded case things are looking about flat.

In [1]: from pandas.util.testing import test_parallel
In [2]: dti = pd.date_range('1900-1-1', periods=100000)

In [3]: def f():
   ...:     for i in range(4):
   ...:         dti.year
In [4]: @test_parallel(4)
   ...: def g():
   ...:     dti.year

In [8]: %timeit f()
10 loops, best of 3: 25.8 ms per loop

In [9]: %timeit g()
100 loops, best of 3: 7.71 ms per loop
pandas/tslib.pyx Outdated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prob makes sense to define this as a c-function and make it nogil (the days_per_month......)

@jreback jreback added Datetime Datetime data dtype Performance Memory or execution speed performance labels Oct 8, 2015
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you declared field as char[:] instead would you be able to nogil the whole thing until raise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, tried that out, but cython doesn't seem to take a view of strings like that? http://stackoverflow.com/questions/28203670/how-to-use-cython-typed-memoryviews-to-accept-strings-from-python

@jreback jreback added this to the 0.17.1 milestone Oct 16, 2015
@jreback
Copy link
Contributor

jreback commented Oct 16, 2015

@chris-b1 loooks good. can you add a whatsnew note (perf) and squash.

@chris-b1 chris-b1 changed the title (WIP) PERF: Release GIL on some datetime ops PERF: Release GIL on some datetime ops Oct 16, 2015
@chris-b1
Copy link
Contributor Author

@jreback - updated

jreback added a commit that referenced this pull request Oct 17, 2015
PERF: Release GIL on some datetime ops
@jreback jreback merged commit 7e5b223 into pandas-dev:master Oct 17, 2015
@jreback
Copy link
Contributor

jreback commented Oct 17, 2015

thanks!

@jreback
Copy link
Contributor

jreback commented Oct 20, 2015

@chris-b1 can you add these (clean then make again to see them)

warning: pandas/src/period.pyx:144:24: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:145:23: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:147:55: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:148:19: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:169:24: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:170:19: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:172:15: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:172:53: Use boundscheck(False) for faster access
building 'pandas._period' extension
@chris-b1 chris-b1 deleted the tslib-gil branch October 21, 2015 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Datetime Datetime data dtype Performance Memory or execution speed performance

3 participants