0

Consider the following OpenCL code in which each element in a vector-type variable gets its value via array lookup:

float* tbl = get_data();
int4 offsets = get_offsets();
float4 my_elements = { 
    tbl[offsets.x],
    tbl[offsets.y],
    tbl[offsets.z],
    tbl[offsets.w] 
};

... and never mind the specific types and vector-size I used, i.e. I used float and int and size-4 vectors, but it might have been, say, ushort values, long indices and a size-8 vector.

I don't like the repetitiveness of that code. What is a better OpenCL idiom for doing this kind of "elementwise lookup" without the excessive repetition?

2
  • Is there any pattern to offsets? Or is it more or less random? Also, what do you mean by another pair of types? The index would need to be some type of integer, at least. Commented Sep 16 at 16:55
  • @EdwardMurphy: 1. The question regards the situation where nothing is known about the offsets and the code which generates them. 2. Indeed, the index will always be integral, I just meant that the use of float, int and 4 specifically is not material here, it's just an example. Commented Sep 17 at 6:44

1 Answer 1

-1

You could use float4 my_elements = vload4(offsets.x, tbl);, assuming that offsets = {n,n+1,n+2,n+3}. If offsets is more random, perhaps this would work:

float* tbl = get_data();
int* offsets = get_offsets();
int s = sizeof(offsets)/4;// divide by 4 because sizeof returns number of bytes
float[s] my_elements; 
for (int i = 0; i<s; i++)
{
    my_elements[i]=tbl[offsets[i]];
};

or

float* tbl = get_data();
int4 offsets = get_offsets();
float4 my_elements; 
for (int i = 0; i<4; i++)
{
    my_elements[i]=tbl[offsets[i]];
};
Sign up to request clarification or add additional context in comments.

4 Comments

This doesn't answer the question. The result is a float4, not a float array. And the offsets are not consecutive.
Why is it a vector, if it's length is arbitrary? If it needs to be a vector, perhaps this could work? [See above edit] I don't think there is a better way to do this, as GPUs really don't like non-consecutive memory access.
See link. In short, gpus do memory access in sequential groups across warps, and so if your memory access is random, the gpu will need to do a individual access for every value, potentially slowing access by up to 32x.
See link. Also, there is an error in the above comment. Slowdown is about 2x.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.