A High-Performance CUDA Library for Distributed Dense Linear Algebra
Description
NVIDIA cuBLASMp is a high-performance, multi-process, GPU-accelerated library
for distributed basic dense linear algebra.
cuBLASMp is compatible with 2D block-cyclic data layout and provides PBLAS-like
C APIs.