CUDA C++: Using a template function which calls a template kernel

Question

I have a class which has a template function. This function calls a template kernel. I'm doing my development in Nsight on a Linux box. In doing this, I encounter the following pair of conflicting requirements: 1 - When implementing a template function, the definition must appear in the *.h (or *.cu.h) file since the code is not generated until

Accepted Answer

The wrapper function declaration needs to be in the header file. The function definition does not.Here is what I had in mind:$ cat MyClass.cuhtemplate void kernel_wrapper(T*, int);class MyClass{ public: template void myFunction(T* param1, int param2);};template void MyClass::myFunction(T* param1, int param2){ kernel_wrapper(param1, param2);}$ cat MyKernels.cu#include "MyClass.cuh"#define nTPB 256template __global__ void myKernel(T* param1, int param2){ int i = threadIdx.x+blockDim.x*blockIdx.x; if (i < param2){ param1[i] += (T)param2; }}template void kernel_wrapper(T* param1, int param2){ myKernel<<<(param2+nTPB-1)/nTPB,nTPB>>>(param1, param2); cudaDeviceSynchronize();}template void MyClass::myFunction(float *, int);template void MyClass::myFunction(int *, int);$ cat mymain.cpp#include "MyClass.cuh"int main(){ MyClass A; float *fdata; int *idata, size; A.myFunction(fdata, size); A.myFunction(idata, size);}$ nvcc -c MyKernels.cu$ g++ -o test mymain.cpp MyKernels.o -L/usr/local/cuda/lib64 -lcudart$Note the forced template instantiation. This will be necessary if you want a template specialization to occur in one compilation unit (a .cu file, where kernel definitions belong), so it is usable in another compilation unit (a .cpp file, which does not understand cuda syntax).

Advertisement

Answer