The UVA research community has access to numerous bioinformatics software installed directly or available through the [bioconda](/userinfo/rivanna/software/bioconda) Python modules. Click [here](/userinfo/rivanna/software/bioinformatics#full-list-of-bioinformatics-software-modules) for a comprehensive list of currently-installed bioinformatics software.

Popular Bioinformatics Software

Below are some popular tools and useful links for their documentation and usage:

Tool Version Description Useful Links
BEDTools 2.26.0 BEDTools utilities allow one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF. Homepage
Tutorial
BLAST+ 2.7.1 BLAST+ is a suite of command-line tools that offers applications for BLAST search, BLAST database creation/examination, and sequence filtering. Web BLAST
Manual
BWA 0.7.17 BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM Homepage
Manual
Bowtie2 2.2.9 Bowtie2 is a memory-efficient tool for aligning short sequences to long reference genomes. Homepage
Manual
FastQC 0.11.5 FastQC is a Java application that generates a comprehensive quality control report for raw sequencing data. Homepage
Documentation
GATK 4.0.0.0 The Genome Analysis Toolkit provide tools for variant discovery. In addition to SNP and INDEL identification in germline DNA and RNAseq data, GATK tools include somatic short variant calling, as well as tackle copy number and structural variation. User Guide
Picard 2.1.1 Picard is a set of command line tools for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF. Homepage
Documentation
SAMTools 1.7 SAMTools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format. Homepage
Manual
SPAdes 3.10.1 SPAdes provide pipelines for assembling genomes from Illumina and IonTorrent reads, as well as hybrid assemblies using PacBio, Oxford Nanopore and Sanger reads. It supports paired-end reads, mate-pairs and unpaired reads. Homepage
Manual
STAR 2.5.3a Spliced Transcripts Alignment to a Reference (STAR) is a RNA-seq aligner based on an algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. Homepage
vsearch 2.7.1 VSEARCH (stands for Vectorized Search) is a toolkit for nucleotide sequence analyses, including database search and clustering algorithms. It supports clustering, chimera detection, database searching, merging of paired-end reads, and other sequence manipulation tools. Homepage

Bioinformatics Modules

To get an up-to-date list of the installed bioinformatics applications, log on to Rivanna and run the following command in a terminal window:

module keyword bio

If you know which package you wish to use, you can look for it with

module spider <software>

For example,

module spider bcftools

This returns

----------------------------------------------------------------------------
  bcftools:
----------------------------------------------------------------------------
    Description:
      SAMtools is a suite of programs for interacting with high-throughput
      sequencing data. BCFtools - Reading/writing BCF2/VCF/gVCF files and
      calling/filtering/summarising SNP and short indel sequence variants

     Versions:
        bcftools/1.3.1
        bcftools/1.9

----------------------------------------------------------------------------
  For detailed information about a specific "bcftools" module (including how to
load the modules) use the module's full name.
  For example:

     $ module spider bcftools/1.9
----------------------------------------------------------------------------

Available versions may change, but the format should be the same.

To obtain more information about a specific module version, including a list of any prerequisite modules that must be loaded first, run the module spider command with the version specified; for example:

module spider bcftools/1.3.1

Using a Specific Software Module

To use a specific software package, run the module load command. The module load command in itself does not execute any of the programs but only prepares the environment, i.e. it sets up variables needed to run specific applications and find libraries provided by the module.

After loading a module, you are ready to run the application(s) provided by the module. For example:

module load bcftools/1.3.1
bcftools --version

Output:

bcftools 1.3.1
Using htslib 1.3.1
Copyright (C) 2016 Genome Research Ltd.
License GPLv3+: GNU GPL version 3 or later
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

You will need to include the appropriate module load commands into your SLURM script.

General Considerations for SLURM Jobs

Most bioinformatics software packages are designed to run on a single compute node with varying support for multi-threading and utilization of multiple cpu cores. Many can run on only one core. In that case, please request only a single task.

Some software is multi-threaded. Usually it communicates the number of threads requested through a command-line option. In this case the SLURM job scripts should contain the following two SBATCH directives:

#SBATCH -N 1                    # request single node
#SBATCH --cpus-per-task=<X>     # request multiple cpu cores

Replace <X> with the actual number of cpu cores to be requested. Requesting more than 8 cpu cores does not provide any significant performance gain for many bioinformatics packages. This is a limitation due to code design rather than a Rivanna constraint.

Please be certain that the number of cores you request matches the number you communicate to the software. To be certain, you can often use the environment variable SLURM_CPUS_PER_TASK. For example,

biofoo -n ${SLURM_CPUS_PER_TASK}

You should only deviate from this general resource request format if you are absolutely certain that the software package supports execution on more than one compute node.

Reference Genomes on Rivanna

Research Computing provides a set of ready-to-use reference sequences and annotations for commonly analyzed organisms in a convenient, accessible location on Rivanna:

/project/genomes/

The majority of files have been downloaded from Illumina’s genomes repository (iGenomes), which contain assembly builds and corresponding annotations from Ensembl, NCBI and UCSC. Each genome directory contain index files of the whole genome for use with aligners like BWA and Bowtie2. In addition, STAR2 index files have been generated for each of Homo Sapiens (human) and Mus musculus (mouse) genomic builds.

Click the radio button for the genome of your choice, then click the clipboard icon to copy it. On Rivanna please use the right click method to paste.

Organism Source Build Whole Genome Index Files
FASTA BWA Bowtie2 STAR2
Arabidopsis thaliana Ensembl TAIR9
TAIR10
NCBI build9.1
TAIR10
Chlorocebus sabeus NCBI chlSab2
Danio rerio Ensembl GRCz10
UCSC danRer10
Drosophila melanogaster Ensembl BDGP6
NCBI build5.3
build5.41
UCSC dm6
Escherichia coli strain K12, DH10B Ensembl EB1
NCBI 2008-03-17
Escherichia coli strain K12, MG1655 NCBI 2001-10-15
Homo sapiens Ensembl GRCh37
NCBI GRCh38
UCSC hg19
hg38
Mus musculus NCBI GRCm38
UCSC mm9
mm10
Pan troglodytes Ensembl CHIMP2.1
CHIMP2.1.4
NCBI build3.1
UCSC panTro3
panTro4