Iterative image processing in CUDA

Question

I have written a CUDA kernel to process an image. But depending on the output of the processed image, I have to call the kernel again, to re-tune the image. For example, let us consider an image having 9 pixels

1 2 3
4 5 6
7 8 9

Suppose that, depending on its neighboring values, the value 9 changes to 10. Since the value has changed, I have to re-process the new image, with the same kernel.

1 2 3
4 5 6
7 8 10

I have already written the algorithm to process the image in a single iteration. The way I'm planning to implement the iterations in CUDA is the following:

__global__ void process_image_GPU(unsigned int *d_input, unsigned int *d_output, int dataH, int dataW, unsigned int *val) {

     __shared__ unsigned int sh_map[TOTAL_WIDTH][TOTAL_WIDTH];
     // Do processing
     // If during processing, anywhere any thread changes the value of the image call
            { atomicAdd(val, 1); }

}
int main(int argc, char *argv[]) {
    // Allocate d_input, d_output and call cudaMemcpy
    unsigned int *x, *val;
    x = (unsigned int *)malloc(sizeof(unsigned int));
    x[0] = 0;
    cudaMalloc((void **)&val, sizeof(unsigned int));
    cudaMemcpy((void *)val, (void *)x, sizeof(unsigned int), cudaMemcpyHostToDevice);
    process_image_GPU<<<dimGrid, dimBlock>>>(d_input, d_output, rows, cols, val);
    cudaMemcpy((void *)x, (void *)val, sizeof(unsigned int), cudaMemcpyDeviceToHost);
    if(x != 0) 
        // Call the kernel again
}

Is it the only way to do this? Is there any other efficient way to implement the same?

Thanks a lot for your time.

This problem looks conceptually similar to fast marching/fast sweeping/fast iterative approaches where the ìnformation is propagated across the computational domain by updating neighbor pixels. — Vitality
– Vitality, Commented Oct 12, 2014 at 11:15
Perhaps ideas introduced in that field on setting up iterations and stopping rules are potentially useful. — Vitality
– Vitality, Commented Oct 12, 2014 at 11:19
Thank you. As suggested in one of the answers/comments (which got deleted - I am not sure why), I removed the atomicAdd and replaced it with *val=1 because I am only bothered about whether there is any change or not. — Andrew Mathews
– Andrew Mathews, Commented Oct 13, 2014 at 1:20
May you be more specific on the kind of processing you are performing and, in particular, on how a perturbation on one pixel reflects to perturbations on the neighbor pixels? Perhaps, knowing that, one could suggest an approach different from that you have devised. — Vitality
– Vitality, Commented Oct 13, 2014 at 4:37

Glorfindel · Accepted Answer · 2022-05-15 11:01:32Z

I hazard an answer, despite the almost vanishing information you provided. Hope it helps.

From what you have said, you have already set up an updating rule for your pixels, based on the value of the adjacent pixels. Let x^(k)_ij the value of the pixel number ij at iteration k and let

x^(k+1)_ij = f(x^(k)_(i-1)j, x^(k)_ij, x^(k)_(i+1)j, x^(k)_i(j-1), x^(k)_i(j+1))

I'm assuming the typical stencil-based updating rule, but of course other rules would be possible.

At this point, you have to set up a stopping rule, namely, a rule that indicates if your algorithm has reached convergence. For example, you could evaluate the norm of the difference between the two images at steps k+1 and k.

Once formulated the problem in this way, I would say that you have the following two possibilities:

Rouy-Tourin-like scheme: all the computational pixels are updated in a brute-force way "simultaneously" until convergence is reached;
Fast sweeping method: the computational grid is swept (selective update) along a prefixed number of directions until convergence is reached;

Depending on the kind of problem you are dealing with, I would say that you have the additionl possibility:

Fast iterative method: the computational pixels are selectively updated with the aid of a heap structure.

All the above methods have been compared, for the solution of the eikonal equation, here.

Of course, you will need to show converngence of the above computational schemes for the particular problem of our interest.

Sorry for the delayed reply. Yes, I am using a typical stencil based rule. If the value of the pixel is greater than some threshold of the average of its (neighboring values * a convolution filter), I am replacing it with a new value (average of the values in the neighbor). I am using a brute force method. I am new to parallel programming, am going through the above mentioned methods. Will try to incorporate the same one by one and keep it posted here. Thanks a lot.

Collectives™ on Stack Overflow

Iterative image processing in CUDA

1 Answer 1

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Linked

Related