Overview

Many commonly used bioinformatics software packages on Rivanna are available as individual modules or as Python packages bundled in the bioconda modules.

Please see our HowTo for more information about using this software on Rivanna.

Software Availability

If a particular package is not available, several options are available. If it is sufficiently widely used, Research Computing staff will install it as a new module. If we determine that it is too specialized, you can install it yourself. Please use permanent storage such as your home directory to install software. If you have difficulty we can assist you to install the package.

Please see below for a full listing of available bioinformatics software. If you do not find it there, please check the bioconda package before requesting that we install it.

Reference Genomes

Research Computing makes some standard reference genomes available. For a listing and information about how to copy them, please see our HowTo.

Full List of Bioinformatics Software Modules

Below is a list of software installed as separate modules. Other packages that are based on Python are available in the bioconda environment. Please see the bioconda page for a listing of those packages.

Module Category Description
afni bio AFNI (Analysis of Functional NeuroImages) is a leading software suite of C, Python, R programs and shell scripts primarily developed for the analysis and display of anatomical and functional MRI (FMRI) data. It is freely available (both in source code and in precompiled binaries) for research purposes. The software is made to run on virtually an Unix system with X11 and Motif displays. Binary Packages are provided for MacOS and Linux systems including Fedora, Ubuntu (including Ubuntu under the Windows Subsytem for Linux)
amber bio A suite of biomolecular simulation programs. It began in the late 1970's, and is maintained by an active development community.
angsd bio Program for analysing NGS data.
ascmeme bio ASC+MEME is a fast motif discovery tool that is 10,000 times faster than MEME while preserving the same accuracy.
augustus bio AUGUSTUS is a program to find genes and their structures in one or more genomes.
bamtools bio BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.
bart bio BART (Binding Analysis for Regulation of Transcription) is a bioinformatics tool for predicting functional transcription factors (TFs) that bind at genomic cis-regulatory regions to regulate gene expression in the human or mouse genomes, given a query gene set or a ChIP-seq dataset as input.
bbmap bio BBMap includes a short read aligner, and other bioinformatic tools.
bcftools bio BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF
bcl2fastq2 bio bcl2fastq Conversion Software both demultiplexes data and converts BCL files generated by Illumina sequencing systems to standard FASTQ file formats for downstream analysis.
bedops bio BEDOPS is an open-source command-line toolkit that performs highly efficient and scalable Boolean and other set operations, statistical calculations, archiving, conversion and other management of genomic data of arbitrary scale.
bedtools bio The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage. The utilities are largely based on four widely-used file formats: BED, GFF/GTF, VCF, and SAM/BAM.
bicseq2-norm bio BICseq2 is an algorithm developed for the normalization of high-throughput sequencing (HTS) data and detect copy number variations (CNV) in the genome. BICseq2 can be used for detecting CNVs with or without a control genome. BICseq2-norm is for normalizing potential biases in the sequencing data.
bicseq2-seg bio BICseq2 is an algorithm developed for the normalization of high-throughput sequencing (HTS) data and detect copy number variations (CNV) in the genome. BICseq2 can be used for detecting CNVs with or without a control genome. BICseq2-seg is for detecting CNVs based on the normalized data given by BICseq2-norm.
bioconda bio Bioconda is a channel for the conda package manager specializing in bioinformatics software.
bioperl bio Bioperl is the product of a community effort to produce Perl code which is useful in biology. Examples include Sequence objects, Alignment objects and database searching objects.
biopython bio Biopython is a set of freely available tools for biological computation written in Python by an international team of developers. It is a distributed collaborative effort to develop Python libraries and applications which address the needs of current and future work in bioinformatics.
bismark bio A tool to map bisulfite converted sequence reads and determine cytosine methylation states
blast bio Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences.
bowtie2 bio Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.
bsmap bio BSMAP is a short reads mapping program for bisulfite sequencing in DNA methylation study. Bisulfite treatment coupled with next generation sequencing could estimate the methylation ratio of every single Cytosine location in the genome by mapping high throughput bisulfite reads to the reference sequences.
bwa bio Burrows-Wheeler Aligner (BWA) is an efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome.
caviar bio caviar is a statistical framework that quantifies the probability of each variant to be causal while allowing with arbitrary number of causal variants.
cd-hit bio CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.
cellprofiler bio CellProfiler is an image processing package to generate morphometric measurements.
cellranger bio A set of analysis piplines that perform sample demultiplexing, barcode processing, and single cell 3' gene counting.
cellranger-dna bio Cell Ranger DNA is a set of analysis pipelines that process Chromium single cell DNA sequencing output to align reads, identify copy number variation (CNV), and compare heterogeneity among cells.
circos bio Circos is a software package for visualizing data and information. It visualizes data in a circular layout - this makes Circos ideal for exploring relationships between objects or positions.
clearcut bio Clearcut is the reference implementation for the Relaxed Neighbor Joining (RNJ) algorithm by J. Evans, L. Sheneman, and J. Foster from the Initiative for Bioinformatics and Evolutionary Studies (IBEST) at the University of Idaho.
cp-analyst bio CellProfiler Analyst (CPA) allows interactive exploration and analysis of data, particularly from high-throughput, image-based experiments. Included is a supervised machine learning system which can be trained to recognize complicated and subtle phenotypes, for automatic scoring of millions of cells. CellProfiler is an image processing package to generate morphometric measurements.
cutadapt bio Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
danpos bio Danpos is a toolkit for Dynamic Analysis of Nucleosome and Protein Occupancy by Sequencing, version 2
decontaminer bio decontaMiner, a tool for detecting contaminating organisms in human unmapped sequences.
deeptools bio deepTools contains useful modules to process the mapped reads data for multiple quality checks, creating normalized coverage files in standard bedGraph and bigWig file formats, that allow comparison between different files (for example, treatment and control). Finally, using such normalized and standardized files, deepTools can create many publication-ready visualizations to identify enrichments and for functional annotations of the genome.
diamond bio DIAMOND is a sequence aligner for protein and translated DNA searches and functions as a drop-in replacement for the NCBI BLAST software tools. It is suitable for protein-protein search as well as DNA-protein search on short reads and longer sequences including contigs and assemblies, providing a speedup of BLAST ranging up to x20,000.
eigensoft bio The EIGENSOFT package combines functionality from our population genetics methods (Patterson et al. 2006) and our EIGENSTRAT stratification correction method (Price et al. 2006). The EIGENSTRAT method uses principal components analysis to explicitly model ancestry differences between cases and controls along continuous axes of variation; the resulting correction is specific to a candidate marker’s variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. The EIGENSOFT package has a built-in plotting script and supports multiple file formats and quantitative phenotypes.
emboss bio EMBOSS is 'The European Molecular Biology Open Software Suite'. EMBOSS is a free Open Source software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community.
epic bio epic is a software package for finding medium to diffusely enriched domains in chip-seq data. It is a fast, parallel and memory-efficient implementation of the popular SICER algorithm.
exonerate bio Exonerate is a generic tool for pairwise sequence comparison. It allows you to align sequences using a many alignment models, using either exhaustive dynamic programming, or a variety of heuristics.
fasta bio The FASTA programs find regions of local or global (new) similarity between protein or DNA sequences, either by searching Protein or DNA databases, or by identifying local duplications within a sequence.
fastenloc bio fastENLOC: fast enrichment estimation aided colocalization analysis enables integrative genetic association analysis of molecular QTL data and GWAS data.
fastqc bio FastQC is a Java application which takes a FastQ file and runs a series of tests on it to generate a comprehensive QC report.
fastqtl bio FastQTL is a QTL mapper
fastx-toolkit bio The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
freebayes bio FreeBayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.
freesurfer bio FreeSurfer is a set of tools for analysis and visualization of structural and functional brain imaging data. FreeSurfer contains a fully automatic structural imaging stream for processing cross sectional and longitudinal data.
fsa bio FSA:Fast Statistical Alignment, is a probabilistic multiple sequence alignment algorithm which uses a distance-based approach to aligning homologous protein, RNA or DNA sequences.
fsl bio FSL is a comprehensive library of analysis tools for FMRI, MRI and DTI brain imaging data.
gatk bio The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyse next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
gd bio GD.pm - Interface to Gd Graphics Library
gemma bio Genome-wide Efficient Mixed Model Association
genometools bio The GenomeTools genome analysis system is a free collection of bioinformatics tools (in the realm of genome informatics) combined into a single binary named gt. It is based on a C library named “libgenometools” which consists of several modules.
gffcompare bio The program gffcompare can be used to compare, merge, annotate, and estimate accuracy of one or more GFF files (the 'query' files), when compared with a reference annotation (also provided as GFF).
gmap-gsnap bio GMAP: A Genomic Mapping and Alignment Program for mRNA and EST Sequences GSNAP: Genomic Short-read Nucleotide Alignment Program
hic-pro bio HiC-Pro is an optimized and flexible pipeline for Hi-C data processing.
hisat2 bio HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) against the general human population (as well as against a single reference genome).
htslib bio A C library for reading/writing high-throughput sequencing data. This package includes the utilities bgzip and tabix
idr bio The IDR (Irreproducible Discovery Rate) framework is a unified approach to measure the reproducibility of findings identified from replicate experiments and provide highly stable thresholds based on reproducibility. The IDR method compares a pair of ranked lists of identifications (such as ChIP-seq peaks).
impute2 bio IMPUTE version 2 (also known as IMPUTE2) is a genotype imputation and haplotype phasing program based on ideas from Howie et al. 2009
intervene bio Intervene is a tool for intersection and visualization of multiple genomic region sets.
io_lib bio Io_lib is a library of file reading and writing code to provide a general purpose trace file (and Experiment File) reading interface. The programmer simply calls the (eg) read_reading to create a "Read" C structure with the data loaded into memory. It has been compiled and tested on a variety of unix systems, MacOS X and MS Windows.
irfinder bio IRFinder is a tool for detecting intron retention from RNA-Seq experiments.
jcuda bio Java bindings for NVIDIA CUDA and related libraries.
juicer bio Juicer is a one-click pipeline for processing terabase scale Hi-C datasets.
kallisto bio Kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment.
kraken bio Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies.
locuszoom bio LocusZoom Standalone is for the command line (standalone) version of LocusZoom, an application for creating regional plots from genome-wide association studies built in Python and R.
longranger bio Long Ranger is a set of analysis pipelines that processes Chromium sequencing output to align reads and call and phase SNPs, indels, and structural variants.
macs2 bio With the improvement of sequencing techniques, chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq) is getting popular to study genome-wide protein-DNA interactions. To address the lack of powerful ChIP-Seq analysis method, we presented the Model-based Analysis of ChIP-Seq (MACS), for identifying transcript factor binding sites. MACS captures the influence of genome complexity to evaluate the significance of enriched ChIP regions and MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation.
manta bio Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs. Manta discovers, assembles and scores large-scale SVs, medium-sized indels and large insertions within a single efficient workflow.
marge bio MARGE is a robust methodology that leverages a comprehensive library of genome-wide H3K27ac ChIP-seq profiles to predict key regulated genes and cis-regulatory regions in human or mouse.
meme bio The MEME Suite allows you to: * discover motifs using MEME, DREME (DNA only) or GLAM2 on groups of related DNA or protein sequences, * search sequence databases with motifs using MAST, FIMO, MCAST or GLAM2SCAN, * compare a motif to all motifs in a database of motifs, * associate motifs with Gene Ontology terms via their putative target genes, and * analyse motif enrichment using SpaMo or CentriMo.
mothur bio Mothur is a single piece of open-source, expandable software to fill the bioinformatics needs of the microbial ecology community.
mrtrix3 bio MRtrix3 provides a set of tools to perform various types of diffusion MRI analyses, from various forms of tractography through to next-generation group-level analyses. It is designed with consistency, performance, and stability in mind, and is freely available under an open-source license. It is developed and maintained by a team of experts in the field, fostering an active community of users from diverse backgrounds.
mrtrix3tissue bio MRtrix3Tissue is a fork of the MRtrix3 project. It aims to add capabilities for 3-Tissue CSD modelling and analysis to a complete version of the MRtrix3 software.
mummer bio MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form. AMOS makes use of it.
muscle bio MUSCLE is one of the best-performing multiple alignment programs according to published benchmark tests, with accuracy and speed that are consistently better than CLUSTALW. MUSCLE can align hundreds of sequences in seconds. Most users learn everything they need to know about MUSCLE in a few minutes—only a handful of command-line options are needed to perform common alignment tasks.
mutect bio MuTect is a method developed at the Broad Institute for the reliable and accurate identification of somatic point mutations in next generation sequencing data of cancer genomes.
mutsigcv bio MutSig stands for "Mutation Significance". MutSig analyzes lists of mutations discovered in DNA sequencing, to identify genes that were mutated more often than expected by chance given background mutation processes.
ncbi-vdb bio The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives.
neuron bio Empirically-based simulations of neurons and networks of neurons.
ngs bio NGS is a new, domain-specific API for accessing reads, alignments and pileups produced from Next Generation Sequencing.
ngsf bio ngsF is a program to estimate per-individual inbreeding coefficients under a probabilistic framework that takes the uncertainty of genotype's assignation into account. It avoids calling genotypes by using genotype likelihoods or posterior probabilities.
ngsplot bio ngs.plot allows easy visualization of next-generation sequencing (NGS) samples at functional genomic regions.
nseg bio Nseg is used to identify low complexity sequencesi.
openms bio OpenMS is an open-source software C++ library for LC-MS data management and analyses. It offers an infrastructure for rapid development of mass spectrometry related software.
openslide bio OpenSlide is a C library that provides a simple interface to read whole-slide images.
openslide-python bio Python bindings for the OpenSlide libary
p4vasp bio Variation graphs provide a succinct encoding of the sequences of many genomes.
paintor bio PAINTOR is a statistical fine-mapping method that integrates functional genomic data with association strength from potentially multiple populations (or traits) to prioritize variants for follow-up analysis.
patric bio PATRIC is an integration of different types of data and software tools that support research on bacterial pathogens.
peakseq bio PeakSeq is a program for identifying and ranking peak regions in ChIP-Seq experiments. It takes as input, mapped reads from a ChIP-Seq experiment, mapped reads from a control experiment and outputs a file with peak regions ranked with increasing Q-values.
peer bio PEER is a collection of Bayesian approaches to infer hidden determinants and their effects from gene expression profiles using factor analysis methods.
picard bio A set of tools (in Java) for working with next generation sequencing data in the BAM (http://samtools.github.io/hts-specs) format.
plink bio PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner. The focus of PLINK is purely on analysis of genotype/phenotype data, so there is no support for steps prior to this (e.g. study design and planning, generating genotype or CNV calls from raw data). Through integration with gPLINK and Haploview, there is some support for the subsequent visualization, annotation and storage of results.
prokka bio Prokka is a software tool for the rapid annotation of prokaryotic genomes.
proteowiz bio ProteoWizard provides a set of open-source, cross-platform software libraries and tools (e.g. msconvert, Skyline, IDPicker, SeeMS) that facilitate proteomics data analysis. The libraries enable rapid tool creation by providing a robust, pluggable development framework that simplifies and unifies data file access, and performs standard chemistry and LCMS dataset computations.
qiime bio QIIME is an open-source bioinformatics pipeline for performing microbiome analysis from raw DNA sequencing data.
qiime2 bio QIIME 2 is a next-generation microbiome bioinformatics platform that is extensible, free, open source, and community developed.
qtltools bio QTLtools is a tool set for molecular QTL discovery and analysis. It allows to go from the raw sequence data to collection of molecular Quantitative Trait Loci (QTLs) in few easy-to-perform steps.
qualimap bio Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Inteface (GUI) and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.
rasqual bio RASQUAL (Robust Allele Specific QUAntification and quality controL) maps QTLs for sequenced based cellular traits by combining population and allele-specific signals.
raxml bio RAxML search algorithm for maximum likelihood based inference of phylogenetic trees.
rdp-classifier bio The RDP Classifier is a naive Bayesian classifier that can rapidly and accurately provides taxonomic assignments from domain to genus, with confidence estimates for each assignment.
relion bio RELION (for REgularised LIkelihood OptimisatioN, pronounce rely-on) is a stand-alone computer program that employs an empirical Bayesian approach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM).
rsem bio RNA-Seq by Expectation-Maximization
salmon bio Salmon is a wicked-fast program to produce a highly-accurate, transcript-level quantification estimates from RNA-seq data.
seqoutbias bio Molecular biology enzymes have nucleic acid preferences for their substrates; the preference of an enzyme is typically dictated by the sequence at or near the active site of the enzyme. This bias may result in spurious read count patterns when used to interpret high-resolution molecular genomics data. The seqOutBias program aims to correct this issue by scaling the aligned read counts by the ratio of genome-wide observed read counts to the expected sequence based counts for each k-mer.
shapeit bio SHAPEIT is a fast and accurate method for estimation of haplotypes (aka phasing) from genotype or sequencing data.
shapeit4 bio SHAPEIT4 is a fast and accurate method for estimation of haplotypes (aka phasing) for SNP array and high coverage sequencing data. The version 4 is a refactored and improved version of the SHAPEIT algorithm.
sicerpy bio SICER.py is a Python wrapper for the SICER peak caller software.
slim bio SLiM is an evolutionary simulation package that provides facilities for very easily and quickly constructing genetically explicit individual-based evolutionary models.
sortmerna bio SortMeRNA is a biological sequence analysis tool for filtering, mapping and OTU-picking NGS reads.
sratoolkit bio The SRA Toolkit, and the source-code SRA System Development Kit (SDK), will allow you to programmatically access data housed within SRA and convert it from the SRA format
stringtie bio StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts.
tabix bio Generic indexer for TAB-delimited genome position files
taggraph bio TagGraph is a computational tool that provides an unrestricted string-based search method that is as much as 350-fold faster than existing approaches, and a probabilistic validation model that was optimized for post-translational modification assignments.
tophat bio TopHat is a fast splice junction mapper for RNA-Seq reads.
torus bio TORUS - QTL Discovery utilizing Genomic Annotations is a free software package that implements a computational procedure for discovering molecular QTLs incorporating genomic annotations.
trimgalore bio Trim Galore is a wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data.
trimmomatic bio Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.
ucsc-tools bio A set of genome utilities developed at the University of California Santa Cruz.
varscan bio VarScan - Variant calling and somatic mutation/CNV detection for next-generation sequencing data
vcftools bio The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.
velvet bio Sequence assembler for very short reads
vep bio VEP determines the effect of your variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions.
vg bio Variation graphs provide a succinct encoding of the sequences of many genomes.
viennarna bio The Vienna RNA Package consists of a C code library and several stand-alone programs for the prediction and comparison of RNA secondary structures.
vsearch bio VSEARCH which supports de novo and reference based chimera detection, clustering, full-length and prefix dereplication, rereplication, reverse complementation, masking, all-vs-all pairwise global alignment, exact and global alignment searching, shuffling, subsampling and sorting. It also supports FASTQ file analysis, filtering, conversion and merging of paired-end reads.
wasp bio WASP is a suite of tools for unbiased allele-specific read mapping and discovery of molecular QTLs.
wigtobigwig bio The bigWig format is useful for dense, continuous data that will be displayed in the Genome Browser as a graph. BigWig files are created from wiggle (wig) type files using the program wigToBigWig.