This question is related to LLVM/clang. I already know how to compile opencl-kernel-file.cl using the OpenCL API (clBuildProgram() and clGetProgramBuildInfo()).
My question is:
How to compile opencl-kernel-file(.cl) to LLVM IR with OpenCL 1.2 or higher?
In the other words, How to compile opencl-kernel-file(.cl) to LLVM IR without libclc?
I have tried various methods to generate LLVM-IR from an OpenCL kernel.
I first followed the clang user manual.(https://clang.llvm.org/docs/UsersManual.html#opencl-features) but it did not run.
Secondly, I found a way to use libclc:
clang++ -emit-llvm -c -target -nvptx64-nvidial-nvcl -Dcl_clang_storage_class_specifiers -include /usr/local/include/clc/clc.h -fpack-struct=64 -o "$@".bc "$@" <br>
llvm-link "$@".bc /usr/local/lib/clc/nvptx64--nvidiacl.bc -o "$@".linked.bc <br>
llc -mcpu=sm_52 -march=nvptx64 "$@".linked.bc -o "$@".nvptx.s
This method worked fine, but since libclc was built on top of the OpenCL 1.1 specification, it couldn't be used with OpenCL 1.2 or later code such as code using printf.
And this method uses libclc, which implements OpenCL built-in functions in the shape of new function. You can observe that in the assembly (ptx) of result OpenCL binary, it goes straight to the function call instead of converting it to an inline assembly. I am concerned that this will affect GPU performance.
So I am looking for a way to replace compilation using libclc. As a last resort, I'm considering using libclc with the NVPTX backend and AMDGPU backend of LLVM. But if there is already another way, I want to use it. (I expect that the OpenCL front-end I have not found yet exists in clang)
My program's scenarios are:
- There is opencl kernel source file(.cl)
- Compile the file to LLVM IR
- IR-Level process to the IR
- Compile(using llc) the IR to Binary
- with each gpu targets(nvptx, amdgcn..)
- Using the binary, Run host(.c or .cpp with lib OpenCL) with clCreateProgramWithBinary()
Now, When I compile kernel source file to LLVM IR, I have to include header of libclc(-include option in first one of above command) for compiling built-in functions. And I have to link libclc libraries before compile IR to binary
My environments are below:
- GTX960
- NVIDIA's Binary appears in nvptx format
- I'm using sm_52 nvptx for my gpu.
- NVIDIA's Binary appears in nvptx format
- Ubuntu Linux 16.04 LTS
- LLVM/Clang 5.0.0
- If there is another way, I am willing to change the LLVM version.
Thanks in advice!