PyTorch is a deep learning framework that puts Python first. It provides Tensors and Dynamic neural networks in Python with strong GPU acceleration.
Software Category: data
For detailed information, visit the PyTorch
The current installation of PyTorch
incorporates the most popular packages. To find the available versions and learn how to load them, run:
module spider pytorch
The output of the command shows the available PyTorch
For detailed information about a particular PyTorch
module, including how to load the module, run the
module spider command with the module’s full version label. For example:
module spider pytorch/1.10.0
|Module||Version||Module Load Command|
|pytorch||1.10.0||module load singularity/3.7.1 pytorch/1.10.0|
|pytorch||1.12.0||module load singularity/3.7.1 pytorch/1.12.0|
|pytorch||1.8.1||module load singularity/3.7.1 pytorch/1.8.1|
Versions 1.6 and older are not compatible with the A100 GPU. Deprecated containers are hosted in
/share/resources/containers/singularity/archive. You may continue to use them on other GPUs by excluding the A100 via the Slurm option
Version 1.8.1 is not compatible with the K80 GPU. You may use it on other GPUs by excluding all K80s via the Slurm option
PyTorch Jupyter Notebooks
Jupyter Notebooks can be used for interactive code development and execution of Python scripts and several other codes. PyTorch Jupyter kernels are backed by containers in the corresponding modules.
Accessing the JupyterLab Portal
- Open a web browser and go to: https://rivanna-portal.hpc.virginia.edu.
- Use your “Netbadge” credentials to log in.
- On the top right of the menu bar of the Open OnDemand dashboard, click on
- In the drop-down box, click on
Requesting access to a GPU node
To start a JupyterLab session, fill out the resource request webform. To request access to a GPU, verify the correct selection for the following parameters:
- Under Rivanna Partition, choose “GPU”.
- Under Optional GPU Type, choose “NVIDIA K80”, “NVIDIA P100”, “NVIDIA V100”, “NVIDIA RTX20280” or leave it as “default”.
Launch to start the session.
Editing and Running the Notebook
Once the JupyterLab instance has started, you can edit and run your notebook as described here.
PyTorch Slurm jobs
The following is a Slurm script template. The commented numbers correspond to the items in the ensuing notes.
#SBATCH -A mygroup
#SBATCH -p gpu # 1
#SBATCH --gres=gpu:1 # 1
#SBATCH -c 1
#SBATCH -t 00:01:00
#SBATCH -J pytorchtest
#SBATCH -o pytorchtest-%A.out
#SBATCH -e pytorchtest-%A.err
module load singularity pytorch/1.8.1 # 2
singularity run --nv $CONTAINERDIR/pytorch-1.8.1.sif pytorch_example.py # 3
The Slurm script needs to include the
#SBATCH -p gpuand
#SBATCH --gres=gpu directives in order to request access to a GPU node and its GPU device. Please visit the Jobs Using a GPU section for details.
To use the pytorch container, load the singularity and pytorch modules. You may choose a different version (see
module spider above).
Do not load the
cudnn modules since these libraries are included with pytorch.
--nv flag sets up the container’s environment to use a GPU when running a GPU-enabled application. The
run command executes the default command defined in the container, which in this case is
python. What follows after the
*.sif is passed as arguments. In summary, the singularity command can be translated as: “Use the
python interpreter inside the pytorch container to execute
pytorch_example.py with GPU enabled.”
PyTorch Interactive Jobs (ijob)
Start an ijob. Note the addition of
-p gpu and
--gres=gpu to request access to a GPU node and its GPU device.
ijob -A mygroup -p gpu --gres=gpu -c 1
module load singularity pytorch/1.8.1
singularity run --nv $CONTAINERDIR/pytorch-1.8.1.sif pytorch_example.py
Interaction with the Host File System
The following user directories are overlayed onto each container by default on Rivanna:
Due to the overlay, these directories are by default the same inside and outside the container with the same read, write, and execute permissions. This means that file modifications in these directories (e.g. in /home) via processes running inside the container are persistent even after the container instance exits. The
/project directories refer to leased storage locations that may not be available to all users.