CUDA学习心得

CUDA学习心得

本文将记录一些零碎的关于CUDA C的知识。

最快得到设备属性的办法 The fast way to query device properties

仍然有一些教材和文章使用cudaGetDeviceProperties()去得到设备属性。但对于更高级的开发人员,NVIDIA官方给出了这个函数

cudaDeviceGetAttribute();

原理

cudaGetDeviceProperties()会给出所有的属性,而很多情况下我们只需要用其中的一两个而已;而cudaDeviceGetAttribute()则只给出调用者想要的那一个属性(attribute)/返回值。所以二者有了几个数量级的差距,纳秒vs毫秒。

调用方法

__host____device__ cudaError_t cudaDeviceGetAttribute ( int* value, cudaDeviceAttr attr, int  device )

参数:

value

- Returned device attribute value

attr

- Device attribute to query

device

- Device number to query

cudaDeviceAttr

CUDA device attributes。也就是我们的第二个参数。

如果我想要知道每个block最多可以有多少个thread,那么

int deviceId;
int threadsPerBlocks;
cudaDeviceGetAttribute(&threadsPerBlock, cudaDevAttrMaxThredsPerBlock, deviceId);

cudaDeviceAttr共有115个不同的赋值选择。前二十个如下所示。

cudaDevAttrMaxThreadsPerBlock = 1

Maximum number of threads per block

cudaDevAttrMaxBlockDimX = 2

Maximum block dimension X

cudaDevAttrMaxBlockDimY = 3

Maximum block dimension Y

cudaDevAttrMaxBlockDimZ = 4

Maximum block dimension Z

cudaDevAttrMaxGridDimX = 5

Maximum grid dimension X

cudaDevAttrMaxGridDimY = 6

Maximum grid dimension Y

cudaDevAttrMaxGridDimZ = 7

Maximum grid dimension Z

cudaDevAttrMaxSharedMemoryPerBlock = 8

Maximum shared memory available per block in bytes

cudaDevAttrTotalConstantMemory = 9

Memory available on device for __constant__ variables in a CUDA C kernel in bytes

cudaDevAttrWarpSize = 10

Warp size in threads

cudaDevAttrMaxPitch = 11

Maximum pitch in bytes allowed by memory copies

cudaDevAttrMaxRegistersPerBlock = 12

Maximum number of 32-bit registers available per block

cudaDevAttrClockRate = 13

Peak clock frequency in kilohertz

cudaDevAttrTextureAlignment = 14

Alignment requirement for textures

cudaDevAttrGpuOverlap = 15

Device can possibly copy memory and execute a kernel concurrently

cudaDevAttrMultiProcessorCount = 16

Number of multiprocessors on device

cudaDevAttrKernelExecTimeout = 17

Specifies whether there is a run time limit on kernels

cudaDevAttrIntegrated = 18

Device is integrated with host memory

cudaDevAttrCanMapHostMemory = 19

Device can map host memory into CUDA address space

cudaDevAttrComputeMode = 20

Compute mode 

参考资料

https://developer.nvidia.com/blog/cuda-pro-tip-the-fast-way-to-query-device-properties/

https://docs.nvidia.com/cuda/cuda-runtime-api/

上一篇:idea~创建maven webapp项目


下一篇:Kubernetes节点之间的ping监控