Skip to main content
Stack Overflow for Teams is now Stack Internal: See how we’re powering the human intelligence layer of enterprise AI. Read more >

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

3
  • Does this cdef double[::1] output = np.empty(n, dtype=np.float64) improve the performance? It looks like cdef np.ndarray[dtype=double] output = np.empty(n, dtype=np.float64) causes strided memory access afterwards which often prevents SIMD-vectorization. (I looked that up in the html generated with the -a flag, but have no gcc available right now.) Commented Aug 27, 2018 at 22:10
  • 1
    @max9111 If SIMD-vectorization is the reason for the speed-up, than one should probably use continuous memory view as you suggested. In this case it didn't change much (see my edit). Maybe this is missed optimization from gcc? Commented Aug 28, 2018 at 4:09
  • 1
    Roughly equivalent in godbolt - godbolt.org/z/h_qNbH - does seem like clang does a lot 'more' - some of that is just loop unrolling, but its overall vectorization strategy is different too. Commented Aug 28, 2018 at 14:35