Edit - Stack Overflow

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

Rev

So kernel1 changes image1 and then kernel2 changes the changed image1 for kernel3?

VAndrei
– VAndrei

2014-10-14 11:27:06 +00:00
Commented Oct 14, 2014 at 11:27
Yes, kernel1 changes image1, the resulting image1 is given to kernel2 and then the resulting image1 is given to kernel3.

Andrew Mathews
– Andrew Mathews

2014-10-14 11:34:25 +00:00
Commented Oct 14, 2014 at 11:34
you could use a CPU parallel threading model, like OpenMP, and create one stream for each OMP thread. Place one while loop in each OMP thread, and have the while loops individually draw new images to be processed from a queue. I'd be very surprised if you get much performance improvement this way, unless your kernels are trivially small.

Robert Crovella
– Robert Crovella

2014-10-16 02:42:01 +00:00
Commented Oct 16, 2014 at 2:42
Sorry, I was stuck with fine tuning the algorithm itself - which had nothing to do with CUDA, hence the delay in reply. Why do you say - "I'd be very surprised if you get much performance improvement this way, unless your kernels are trivially small." What is the reason for it? Each image of mine has either 230x230 pixels or 16384x7 pixels. So parallel processing multiple images should give me speedup right? (Is there no way to do it without using OpenMP?

Andrew Mathews
– Andrew Mathews

2014-10-22 05:45:28 +00:00
Commented Oct 22, 2014 at 5:45

Add a comment |

Correct minor typos or mistakes
Clarify meaning without changing it
Add related resources or links
Always respect the author’s intent
Don’t use edits to reply to the author

Collectives™ on Stack Overflow