Edit - Stack Overflow

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

Rev

Does this cdef double[::1] output = np.empty(n, dtype=np.float64) improve the performance? It looks like cdef np.ndarray[dtype=double] output = np.empty(n, dtype=np.float64) causes strided memory access afterwards which often prevents SIMD-vectorization. (I looked that up in the html generated with the -a flag, but have no gcc available right now.)

max9111
– max9111

2018-08-27 22:10:46 +00:00
Commented Aug 27, 2018 at 22:10
1

@max9111 If SIMD-vectorization is the reason for the speed-up, than one should probably use continuous memory view as you suggested. In this case it didn't change much (see my edit). Maybe this is missed optimization from gcc?

ead
– ead

2018-08-28 04:09:27 +00:00
Commented Aug 28, 2018 at 4:09
1

Roughly equivalent in godbolt - godbolt.org/z/h_qNbH - does seem like clang does a lot 'more' - some of that is just loop unrolling, but its overall vectorization strategy is different too.

chrisb
– chrisb

2018-08-28 14:35:17 +00:00
Commented Aug 28, 2018 at 14:35

Add a comment |

Correct minor typos or mistakes
Clarify meaning without changing it
Add related resources or links
Always respect the author’s intent
Don’t use edits to reply to the author

Collectives™ on Stack Overflow