GPU direct RDMA

In order to read and write remote GPU memory, we need to allow direct access to it. In ealier days, the usual solution is to have a mapping between the GPU memory and the host memory, copy the GPU memory to the host memory then perform regular RDMA.

Nowadays with specialized graphics card and NICs, we can directly support direct RDMA access from GPU to NICs. That technique is called GPUDirect RDMA.

GPUDirect RDMA within the Linux Device Driver Model

_{GPUDirect RDMA within the Linux Device Driver Model}

Implementation details

Kernel requirements

To directly map the GPU memory to a memory region, a specialized driver is required. Nvidia provides a kernel module nvidia-peermem to facilicate this.

Memory details

Note that only GPU memory in CUDA VA could be used in GPUDirect RDMA. Further more, pinning and unpinning GPU memory is required during data transfer. While the most straightforward way is just to pin it every time before transfer and unpin afterward, this is not recommended as both operations are costly. It is best to implement a cache system that would provide pinned memory every time it is required and unpin it lazily.