Tag: nvcc

__ldg causes slower execution time in certain situation

I posted this issue already yesterday, but wasnt well received, though I have solid repro now, please bear with me. Here are system specs: Tesla K20m with 331.67 driver, CUDA 6.0, Linux machine. Now I have a global memory read heavy application therefore I tried to optimize it using __ldg instruction on every…