I am using OpenCL to execute a procedure on different GPUs and CPUs simultaneously to get a high performance results. The Intel OpenCL is always showing a message that the Kernel is not vectorized, so it will only run on different cores but will not run using SIMD instructions. My question is, if I rewrite the code so that the SIMD instruction can be exploit with the OpenCL code, will it increase the GPU Performance also?
1 Answer
Yes - but beware that this is not necessary on AMD GCN based APU/GPU or Nvidia Fermi or higher GPU hardware for good performance -they do scalar operations with great utilization. CPUs and Intels GPU however can greatly benefit via SIMD instructions which is what the vector operations boil down to.
2 Comments
mmostajab
So, What I understood from your answer is that I should do that and then profile and see which one is better. Is that true? or for sure, the vectorized code will work faster on GPU anyway?
Jason Newton
well profiling is never a bad idea but if you look into the hardware architecture you are programming for, you will easily have your answer at whether it is a good idea or a waste of time. Just with this quick check you probably don't need to program anything up to know which to do.