AccelerateHS / accelerate Public
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of highly skewed multidimensional reductions #140
Comments
|
I have ran into this problem as well, is there a temporary workaround? |
|
For now, we are using this function, adapted from David Darais in the above thread, as a replacement for |
|
I have not yet had the time to improve the code generation for this type of problem. If you have a good non-synthetic test case you don't mind sharing that I can use to optimise this, please do send it our way. |
|
Actually, what is wrong with the |
|
@RasmusWL had interesting work on this at FHPC'17 in the context of Futhark; we should steal his ideas! |
|
@tmcdonell go ahead! See my thesis and the paper for details ;) |


Performance of multidimensional reductions is not good when the array is highly skewed. For example, a
foldwhere the number of columns is (innermost dimension) is very small. See also this thread:https://groups.google.com/forum/#!topic/accelerate-haskell/KAFYUz4Sjsk
Multidimensional reduction uses one thread block per reduction; so an
(Z :. m :. n)sized matrix usesmthread blocks. Ifnis very small, then many threads in the block sit idle. We could change this to a warp-per-reduction style, which is actually the strategy segmented fold uses. This will likely have a negative impact ifmis small andnlarge.It would be possible to generate both variants and choose dynamically which to execute. That implies compiling four kernels per reduction (because fusion; initial vs. recursive step).
The text was updated successfully, but these errors were encountered: