Overview

R is a programming language that often is used for data analytics, statistical programming, and graphical visualization.

Loading the R module

On Rivanna, R is available through our module system.  To load R, simply type:

module load goolf R

Notice that we included goolf in the load command. There are two reasons why including goolf is important:

  1. R was built with the gcc compiler, an interface to OpenMPI, and other utilities. Due to its hierarchical layout, the module system must be told which build of R is needed.

  2. R has many computationally-intensive packages that are built with C, C++, or Fortran. By including goolf, we ensure that the same environment used for building R is loaded for any package installs.

The load command will load a default version of R, unless another version is specified. For example, you could type:

module load goolf R/4.0.0

To see the available versions of R, type:

module spider R

ModuleVersion Module Load Command
R4.0.0 module load gcc/8.3.0 cuda/10.2.89 R/4.0.0
R.3.4.0 module load gcc/7.1.0 R/.3.4.0
R3.2.1 module load gcc/7.1.0 R/3.2.1
R3.4.4 module load gcc/7.1.0 openmpi/3.1.4 R/3.4.4
R3.5.3 module load gcc/7.1.0 openmpi/3.1.4 R/3.5.3
R3.6.3 module load gcc/7.1.0 openmpi/3.1.4 R/3.6.3
R4.0.0 module load gcc/7.1.0 openmpi/3.1.4 R/4.0.0
R3.6.3 module load intel/18.0 intelmpi/18.0 R/3.6.3
R4.0.0 module load intel/18.0 intelmpi/18.0 R/4.0.0

Loading the RStudio module

RStudio is a development environment for R. It also is supported through its own module, but you must load a version of R first. For example, to load and run Rstudio, you could type the following:

module load goolf R
module load rstudio
rstudio &

RStudio is also available through our web-based portal to Rivanna. For instructions on how to access it, see the Rstudio Server access via OpenOnDemand.

Installing packages

Due to the amount and variability of packages available for R, Research Computing does not maintain R packages beyond the very basic. If you need a package, you can install it in your account, using a local library. For example, to install BiocManager, you can type:

module load goolf R
R
   .
   .
   .
> install.packages('BiocManager')

If the R interpreter prompts you about creating a local library, type yes. If it asks you to select a CRAN mirror, scroll down the list it provides and select one of the US sites.

Or, you can launch RStudio and install the packages as you would on your laptop.

Submitting a Single-Core Job to the Cluster

After you have developed your R program, you can submit it to the compute nodes by using a SLURM job script similar to the following: 

#!/bin/bash
#SBATCH -n 1
#SBATCH -t 01:00:00
#SBATCH -o results.out
#SBATCH -p standard
#SBATCH -A mygroup

module load goolf R
Rscript myRprog.R

This script should be saved in a file, called (for example) job.slurm.  To run your job, you would submit the script by typing:

sbatch job.slurm

Submitting Multi-Core Jobs to the Cluster

R programs can be written to use multiple cores on a node. You will need to ensure that both SLURM and your R code know how many cores they will be using. In the SLURM script, we recommend using --cpus-per-task to specify the number of cores. For example:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=10      #Requests 10 cores
#SBATCH -t 00:30:00
#SBATCH -o results.out
#SBATCH -p standard
#SBATCH -A mygroup

module load goolf R
Rscript myRprog.R ${SLURM_CPUS_PER_TASK}

For the R code, the number of cores can be passed in with a command-line argument, as shown in the above example with ${SLURM_CPUS_PER_TASK}. The code will need to be designed to read in the command-line argument and establish the number of available cores. For example:

cmdArgs <- commandArgs(trailingOnly=TRUE)
numCores <- as.integer(cmdArgs[1])
options(mc.cores=numCores)

Or, you if you do not want to use command-line arguments, you can use the function Sys.getenv() in the R code. For example:

numCores <- as.integer(Sys.getenv("SLURM_CPUS_PER_TASK"))
options(mc.cores=numCores)

Do not use the detectCores() function, which is often shown in tutorial examples. It will detect the number of physical cores – not how many core SLURM is allowing the program to use.

Submitting MPI Jobs to the Cluster

R programs can be distributed across multiple nodes with MPI (message passing interface) and the appropriate MPI packages.  To run a parallel R job that uses MPI, the SLURM script would be similar to the following:

#!/bin/bash
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=10
#SBATCH -t 00:30:00
#SBATCH -o results.out
#SBATCH -p parallel
#SBATCH -A mygroup

module load goolf R

srun Rscript myRprog.R

The items to notice in this script are

i) the number of nodes;

ii) the number of tasks;

iii) the parallel partition; and

iv) the srun before the command to run the R code.

If you have questions about running your R code on Rivanna or would like a consultation to optimize or parallelize your code, contact hpc-support@virginia.edu.