What's the OpenCL idiom for elementwise array-lookup / gather operation with vectorized types?

Question

Consider the following OpenCL code in which each element in a vector-type variable gets its value via array lookup:

float* tbl = get_data();
int4 offsets = get_offsets();
float4 my_elements = { 
    tbl[offsets.x],
    tbl[offsets.y],
    tbl[offsets.z],
    tbl[offsets.w] 
};

... and never mind the specific types and vector-size I used, i.e. I used float and int and size-4 vectors, but it might have been, say, ushort values, long indices and a size-8 vector.

I don't like the repetitiveness of that code. What is a better OpenCL idiom for doing this kind of "elementwise lookup" without the excessive repetition?

Is there any pattern to offsets? Or is it more or less random? Also, what do you mean by another pair of types? The index would need to be some type of integer, at least. — Edward Murphy
– Edward Murphy, Commented Sep 16 at 16:55
@EdwardMurphy: 1. The question regards the situation where nothing is known about the offsets and the code which generates them. 2. Indeed, the index will always be integral, I just meant that the use of float, int and 4 specifically is not material here, it's just an example. — einpoklum
– einpoklum, Commented Sep 17 at 6:44

Edward Murphy · Accepted Answer · 2025-09-17 19:54:56Z

-1

You could use float4 my_elements = vload4(offsets.x, tbl);, assuming that offsets = {n,n+1,n+2,n+3}. If offsets is more random, perhaps this would work:

float* tbl = get_data();
int* offsets = get_offsets();
int s = sizeof(offsets)/4;// divide by 4 because sizeof returns number of bytes
float[s] my_elements; 
for (int i = 0; i<s; i++)
{
    my_elements[i]=tbl[offsets[i]];
};

or

float* tbl = get_data();
int4 offsets = get_offsets();
float4 my_elements; 
for (int i = 0; i<4; i++)
{
    my_elements[i]=tbl[offsets[i]];
};

edited Sep 17 at 19:54

answered Sep 17 at 15:31

Edward Murphy

691 silver badge7 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

einpoklum Sep 17 at 19:46

This doesn't answer the question. The result is a float4, not a float array. And the offsets are not consecutive.

Edward Murphy Sep 17 at 19:50

Why is it a vector, if it's length is arbitrary? If it needs to be a vector, perhaps this could work? [See above edit] I don't think there is a better way to do this, as GPUs really don't like non-consecutive memory access.

Edward Murphy Sep 17 at 20:03

See link. In short, gpus do memory access in sequential groups across warps, and so if your memory access is random, the gpu will need to do a individual access for every value, potentially slowing access by up to 32x.

Edward Murphy Sep 17 at 20:16

See link. Also, there is an error in the above comment. Slowdown is about 2x.

Collectives™ on Stack Overflow

What's the OpenCL idiom for elementwise array-lookup / gather operation with vectorized types?

1 Answer 1

4 Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Related