I have a class which has a template function. This function calls a template kernel. I’m doing my development in Nsight on a Linux box. In doing this, I encounter the following pair of conflicting requirements:
1 – When implementing a template function, the definition must appear in the *.h (or *.cu.h) file since the code is not generated until the template is needed.
2 – Kernel code must appear in the *.cu, since the compiler is not able to recognize the <<< and >>> tokens when they are in the header file.
I think there is probably a way to get around the second one with a little compiler voodoo.
When I set up the system where the template member function is in the *.cu.h file, I get the following compiler errors:
error: expected primary-expression before ‘<‘ token
error: expected primary-expression before ‘>’ token
This seems indicative that it is parsing the << and then the >> tokens and not recognizing the <<< or >>> tokens.
An general outline of structure of the pertinent parts of the code is below:
In MyClass.cu.h:
#include "MyKernels.cu.h" class MyClass{ template <typename T> void myFunction(T* param1, int param2); }; template <typename T> void myFunction(T* param1, int param2){ blocks = 16; blockSize = 512; myKernel<<<blocks, bockSize>>>(d_param1, param2); }
In MyKernels.cu.h:
#ifndef MYKERNELS_H_ #define MYKERNELS_H_ template <typename T> extern __global__ void myKernel(T* param1, int param2); #endif
In MyKernels.cu:
#include "MyKernels.cu.h" template<typename T> __global__ void myKernel(T* param1, int param2){ //Do stuff }
Edit 7/31/2015: To make the structure of what I am trying to accomplish a little more clear, I have written a small demonstrative project. It is posted publicly on github at the following URL:
https://github.com/nvparrish/CudaTemplateProblem
Advertisement
Answer
The wrapper function declaration needs to be in the header file. The function definition does not.
Here is what I had in mind:
$ cat MyClass.cuh template <typename T> void kernel_wrapper(T*, int); class MyClass{ public: template <typename T> void myFunction(T* param1, int param2); }; template <typename T> void MyClass::myFunction(T* param1, int param2){ kernel_wrapper(param1, param2); } $ cat MyKernels.cu #include "MyClass.cuh" #define nTPB 256 template <typename T> __global__ void myKernel(T* param1, int param2){ int i = threadIdx.x+blockDim.x*blockIdx.x; if (i < param2){ param1[i] += (T)param2; } } template <typename T> void kernel_wrapper(T* param1, int param2){ myKernel<<<(param2+nTPB-1)/nTPB,nTPB>>>(param1, param2); cudaDeviceSynchronize(); } template void MyClass::myFunction(float *, int); template void MyClass::myFunction(int *, int); $ cat mymain.cpp #include "MyClass.cuh" int main(){ MyClass A; float *fdata; int *idata, size; A.myFunction(fdata, size); A.myFunction(idata, size); } $ nvcc -c MyKernels.cu $ g++ -o test mymain.cpp MyKernels.o -L/usr/local/cuda/lib64 -lcudart $
Note the forced template instantiation. This will be necessary if you want a template specialization to occur in one compilation unit (a .cu file, where kernel definitions belong), so it is usable in another compilation unit (a .cpp file, which does not understand cuda syntax).