Compiling for a GPU
Using a GPU can accelerate a code, but requires special programming and compiling. Several options are available for GPU-enabled programs.
OpenACC
OpenACC is a standard
Available NVIDIA CUDA Compilers
| Module | Version |
Module Load Command |
| cuda | 11.4.2 |
module load cuda/11.4.2
|
| cuda | 11.8.0 |
module load cuda/11.8.0
|
| cuda | 12.2.2 |
module load cuda/12.2.2
|
| cuda | 12.4.1 |
module load cuda/12.4.1
|
| cuda | 12.8.0 |
module load cuda/12.8.0
|
| Module | Version |
Module Load Command |
| nvhpc | 24.5 |
module load nvhpc/24.5
|
| nvhpc | 25.3 |
module load nvhpc/25.3
|
GPU architecture
According to the CUDA documentation, “in the CUDA naming scheme, GPUs are named sm_xy, where x denotes the GPU generation number, and y the version in that generation.” The documentation contains details about the architecture and the corresponding xy value. The compute capability is x.y.
Please use the following values when compiling CUDA code on the HPC system.
| Type |
GPU |
Architecture |
Compute Capability |
CUDA Version |
| Datacenter |
V100 |
Volta |
7.0 |
9+ |
|
A100 |
Ampere |
8.0 |
11+ |
|
A40 |
Ampere |
8.6 |
11+ |
|
H200 |
Hopper |
9.0 |
11.8+ |
| RTX |
A6000 |
Ampere |
8.6 |
11+ |
| GeForce |
RTX2080Ti |
Turing |
7.5 |
10+ |
|
RTX3090 |
Ampere |
8.6 |
11+ |
As an example, if you are only interested in V100 and A100:
-gencode arch=compute_70,code=sm_70 -gencode arch=compute_80,code=sm_80
|
compiler, gpu, rivanna, software