Overview

Many commonly used bioinformatics software packages on the HPC clusters are available as individual modules or as Python packages bundled in the bioconda modules.

Please see our HowTo for more information about using this software on the HPC system.

Software Availability

If a particular package is not available, several options are available. If it is sufficiently widely used, Research Computing staff will install it as a new module. If we determine that it is too specialized, you can install it yourself. Please use permanent storage such as your home directory to install software. If you have difficulty we can assist you to install the package.

Please see below for a full listing of available bioinformatics software. If you do not find it there, please check the bioconda package before requesting that we install it.

Reference Genomes

Research Computing makes some standard reference genomes available. For a listing and information about how to copy them, please see our HowTo.

Full List of Bioinformatics Software Modules

Below is a list of software installed as separate modules. Other packages that are based on Python are available in the bioconda environment. Please see the bioconda page for a listing of those packages.

Module	Category	Description
afni	bio	AFNI (Analysis of Functional NeuroImages) is a leading software suite of C, Python, R programs and shell scripts primarily developed for the analysis and display of anatomical and functional MRI (FMRI) data. It is freely available (both in source code and in precompiled binaries) for research purposes. The software is made to run on virtually an Unix system with X11 and Motif displays. Binary Packages are provided for MacOS and Linux systems including Fedora, Ubuntu (including Ubuntu under the Windows Subsytem for Linux)
alphafold	bio	Open source code for AlphaFold
alphapulldown	bio	AlphaPulldown is a Python package that streamlines protein-protein interaction screens and high-throughput modelling of higher-order oligomers using AlphaFold-Multimer
angsd	bio	Program for analysing NGS data.
anvio	bio	Anvi'o is an open-source, community-driven analysis and visualization platform for microbial 'omics. It brings together many aspects of today's cutting-edge strategies including genomics, metagenomics, metatranscriptomics, pangenomics, metapangenomics, phylogenomics, and microbial population genetics in an integrated and easy-to-use fashion through extensive interactive visualization capabilities.
augustus	bio	AUGUSTUS is a program to find genes and their structures in one or more genomes.
bamtools	bio	BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files.
bart	bio	BART (Binding Analysis for Regulation of Transcription) is a bioinformatics tool for predicting functional transcription factors (TFs) that bind at genomic cis-regulatory regions to regulate gene expression in the human or mouse genomes, given a query gene set or a ChIP-seq dataset as input.
bart-mri	bio	The Berkeley Advanced Reconstruction Toolbox (BART) toolbox is a free and open-source image-reconstruction framework for Computational Magnetic Resonance Imaging developed by the research groups of Martin Uecker (Goettingen University), Jon Tamir (UT Austin), and Michael Lustig (UC Berkeley). It consists of a programming library and a toolbox of command-line programs. The library provides common operations on multi-dimensional arrays, Fourier and wavelet transforms, as well as generic implementations of iterative optimization algorithms. The command-line tools provide direct access to basic operations on multi-dimensional arrays as well as efficient implementations of many calibration and reconstruction algorithms for parallel imaging and compressed sen.
bbmap	bio	BBMap includes a short read aligner, and other bioinformatic tools.
bcftools	bio	SAMtools is a suite of programs for interacting with high-throughput sequencing data. BCFtools - Reading/writing BCF2/VCF/gVCF files and calling/filtering/summarising SNP and short indel sequence variants
bcl2fastq2	bio	bcl2fastq Conversion Software both demultiplexes data and converts BCL files generated by Illumina sequencing systems to standard FASTQ file formats for downstream analysis.
beagle	bio	Beagle is a software package for phasing genotypes and for imputing ungenotyped markers.
bedops	bio	BEDOPS is an open-source command-line toolkit that performs highly efficient and scalable Boolean and other set operations, statistical calculations, archiving, conversion and other management of genomic data of arbitrary scale. Tasks can be easily split by chromosome for distributing whole-genome analyses across a computational cluster.
bedtools	bio	The BEDTools utilities allow one to address common genomics tasks such as finding feature overlaps and computing coverage. The utilities are largely based on four widely-used file formats: BED, GFF/GTF, VCF, and SAM/BAM.
bicseq2-norm	bio	BICseq2 is an algorithm developed for the normalization of high-throughput sequencing (HTS) data and detect copy number variations (CNV) in the genome. BICseq2 can be used for detecting CNVs with or without a control genome. BICseq2-norm is for normalizing potential biases in the sequencing data.
bicseq2-seg	bio	BICseq2 is an algorithm developed for the normalization of high-throughput sequencing (HTS) data and detect copy number variations (CNV) in the genome. BICseq2 can be used for detecting CNVs with or without a control genome. BICseq2-seg is for detecting CNVs based on the normalized data given by BICseq2-norm.
bioawk	bio	Bioawk is an extension to Brian Kernighan's awk, adding the support of several common biological data formats, including optionally gzip'ed BED, GFF, SAM, VCF, FASTA/Q and TAB-delimited formats with column names. It also adds a few built-in functions and an command line option to use TAB as the input/output delimiter. When the new functionality is not used, bioawk is intended to behave exactly the same as the original BWK awk.
bioconda	bio	Bioconda is a channel for the conda package manager specializing in bioinformatics software.
bioperl	bio	Bioperl is the product of a community effort to produce Perl code which is useful in biology. Examples include Sequence objects, Alignment objects and database searching objects.
biopython	bio	Biopython is a set of freely available tools for biological computation written in Python by an international team of developers. It is a distributed collaborative effort to develop Python libraries and applications which address the needs of current and future work in bioinformatics.
bismark	bio	A tool to map bisulfite converted sequence reads and determine cytosine methylation states
blasr	bio	Variation graphs provide a succinct encoding of the sequences of many genomes.
blast	bio	Basic Local Alignment Search Tool, or BLAST, is an algorithm for comparing primary biological sequence information, such as the amino-acid sequences of different proteins or the nucleotides of DNA sequences.
blat	bio	BLAT on DNA is designed to quickly find sequences of 95% and greater similarity of length 25 bases or more.
bowtie2	bio	Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.
bracken	bio	Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
bsmap	bio	BSMAP is a short reads mapping program for bisulfite sequencing in DNA methylation study. Bisulfite treatment coupled with next generation sequencing could estimate the methylation ratio of every single Cytosine location in the genome by mapping high throughput bisulfite reads to the reference sequences.
busco	bio	BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs
bwa	bio	Burrows-Wheeler Aligner (BWA) is an efficient program that aligns relatively short nucleotide sequences against a long reference sequence such as the human genome.
canu	bio	Canu is a fork of the Celera Assembler designed for high-noise single-molecule sequencing
caviar	bio	caviar is a statistical framework that quantifies the probability of each variant to be causal while allowing with arbitrary number of causal variants.
cd-hit	bio	CD-HIT is a very widely used program for clustering and comparing protein or nucleotide sequences.
cellassign	bio	cellassign automatically assigns single-cell RNA-seq data to known cell types across thousands of cells accounting for patient and batch specific effects. Information about a priori known markers cell types is provided as input to the model in the form of a (binary) marker gene by cell-type matrix. cellassign then probabilistically assigns each cell to a cell type, removing subjective biases from typical unsupervised clustering workflows.
cellpose	bio	a generalist algorithm for cellular segmentation
cellprofiler	bio	CellProfiler is a free open-source software designed to enable biologists without training in computer vision or programming to quantitatively measure phenotypes from thousands of images automatically.
cellranger	bio	A set of analysis piplines that perform sample demultiplexing, barcode processing, and single cell 3' gene counting.
cellranger-arc	bio	Cell Ranger ARC is a set of analysis pipelines that process Chromium Single Cell Multiome ATAC + Gene Expression sequencing data to generate a variety of analyses pertaining to gene expression, chromatin accessibility and their linkage.
cellranger-atac	bio	Cell Ranger ATAC is a set of analysis pipelines that process Chromium Single Cell ATAC data.
cellranger-dna	bio	Cell Ranger DNA is a set of analysis pipelines that process Chromium single cell DNA sequencing output to align reads, identify copy number variation (CNV), and compare heterogeneity among cells.
chopper	bio	Rust implementation of NanoFilt+NanoLyse, both originally written in Python. This tool, intended for long read sequencing such as PacBio or ONT, filters and trims a fastq file. Filtering is done on average read quality and minimal or maximal read length, and applying a headcrop (start of read) and tailcrop (end of read) while printing the reads passing the filter.
circos	bio	Circos is a software package for visualizing data and information. It visualizes data in a circular layout - this makes Circos ideal for exploring relationships between objects or positions.
clara-parabricks	bio	NVIDIA Parabricks is the only GPU-accelerated computational genomics toolkit that delivers fast and accurate analysis for sequencing centers, clinical teams, genomics researchers, and next-generation sequencing instrument developers.
clearcut	bio	Clearcut is the reference implementation for the Relaxed Neighbor Joining (RNJ) algorithm by J. Evans, L. Sheneman, and J. Foster from the Initiative for Bioinformatics and Evolutionary Studies (IBEST) at the University of Idaho.
cnnpeaks	bio	CNN-peaks is a Convolution Neural Network(CNN) based ChIP-Seq peak calling software.
cp-analyst	bio	CellProfiler Analyst (CPA) allows interactive exploration and analysis of data, particularly from high-throughput, image-based experiments. Included is a supervised machine learning system which can be trained to recognize complicated and subtle phenotypes, for automatic scoring of millions of cells. CellProfiler is an image processing package to generate morphometric measurements.
ctffind	bio	Program for finding CTFs of electron micrographs.
cufflinks	bio	Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one, taking into account biases in library preparation protocols.
cumulus_feature_barcoding	bio	A fast C++ tool to extract feature-count matrix from sequence reads in FASTQ files. We uses isal-l for decompressing and Heng Li's kseq library for read parsing. It is used by Cumulus for feature-count matrix generation of cell hashing, nucleus hashing, CITE-Seq and Perturb-seq protocols, using either 10x Genomics V2 or V3 chemistry.
cutadapt	bio	Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
cytoscape	bio	Cytoscape is an open source software platform for visualizing complex networks and integrating these with any type of attribute data. A lot of Apps are available for various kinds of problem domains, including bioinformatics, social network analysis, and semantic web.
danpos	bio	A toolkit for Dynamic Analysis of Nucleosome and Protein Occupancy by Sequencing, version 2.
dbg2olc	bio	A genome assembler that reduces the computational time of human genome assembly from 400,000 CPU hours to 2,000 CPU hours, utilizing long erroneous 3GS sequencing reads and short accurate NGS sequencing reads.
decontaminer	bio	decontaMiner, a tool for detecting contaminating organisms in human unmapped sequences.
deeplabcut	bio	DeepLabCut is a toolbox for markerless pose estimation of animals performing various tasks.
deeptools	bio	deepTools addresses the challenge of handling the large amounts of data that are now routinely generated from DNA sequencing centers. deepTools contains useful modules to process the mapped reads data for multiple quality checks, creating normalized coverage files in standard bedGraph and bigWig file formats, that allow comparison between different files (for example, treatment and control). Finally, using such normalized and standardized files, deepTools can create many publication-ready visualizations to identify enrichments and for functional annotations of the genome.
diamond	bio	DIAMOND is a sequence aligner for protein and translated DNA searches and functions as a drop-in replacement for the NCBI BLAST software tools. It is suitable for protein-protein search as well as DNA-protein search on short reads and longer sequences including contigs and assemblies, providing a speedup of BLAST ranging up to x20,000.
eigensoft	bio	The EIGENSOFT package combines functionality from our population genetics methods (Patterson et al. 2006) and our EIGENSTRAT stratification correction method (Price et al. 2006). The EIGENSTRAT method uses principal components analysis to explicitly model ancestry differences between cases and controls along continuous axes of variation; the resulting correction is specific to a candidate marker’s variation in frequency across ancestral populations, minimizing spurious associations while maximizing power to detect true associations. The EIGENSOFT package has a built-in plotting script and supports multiple file formats and quantitative phenotypes.
emboss	bio	EMBOSS is 'The European Molecular Biology Open Software Suite'. EMBOSS is a free Open Source software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community.
evm	bio	The EVidenceModeler (aka EVM) software combines ab intio gene predictions and protein and transcript alignments into weighted consensus gene structures. EVM provides a flexible and intuitive framework for combining diverse evidence types into a single automated gene structure annotation system.
exonerate	bio	Exonerate is a generic tool for pairwise sequence comparison. It allows you to align sequences using a many alignment models, using either exhaustive dynamic programming, or a variety of heuristics.
fasta	bio	The FASTA programs find regions of local or global (new) similarity between protein or DNA sequences, either by searching Protein or DNA databases, or by identifying local duplications within a sequence.
fastenloc	bio	fastENLOC: fast enrichment estimation aided colocalization analysis enables integrative genetic association analysis of molecular QTL data and GWAS data.
fastqc	bio	FastQC is a quality control application for high throughput sequence data. It reads in sequence data in a variety of formats and can either provide an interactive application to review the results of several different QC checks, or create an HTML based report which can be integrated into a pipeline.
fastx-toolkit	bio	The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
finestructure	bio	fineSTRUCTURE is a fast and powerful algorithm for identifying population structure using dense sequencing data.
fmriprep	bio	fMRIPrep is a NiPreps (NeuroImaging PREProcessing toolS) application (www.nipreps.org) for the preprocessing of task-based and resting-state functional MRI (fMRI).
freebayes	bio	FreeBayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.
freesurfer	bio	FreeSurfer is a set of tools for analysis and visualization of structural and functional brain imaging data. FreeSurfer contains a fully automatic structural imaging stream for processing cross sectional and longitudinal data.
fsa	bio	FSA:Fast Statistical Alignment, is a probabilistic multiple sequence alignment algorithm which uses a distance-based approach to aligning homologous protein, RNA or DNA sequences.
fsl	bio	FSL is a comprehensive library of analysis tools for FMRI, MRI and DTI brain imaging data.
gatk	bio	The Genome Analysis Toolkit or GATK is a software package developed at the Broad Institute to analyse next-generation resequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
gd	bio	GD.pm - Interface to Gd Graphics Library
gemma	bio	Genome-wide Efficient Mixed Model Association
genometools	bio	The GenomeTools genome analysis system is a free collection of bioinformatics tools (in the realm of genome informatics) combined into a single binary named gt. It is based on a C library named “libgenometools” which consists of several modules.
genrich	bio	Genrich is a peak-caller for genomic enrichment assays (e.g. ChIP-seq, ATAC-seq). It analyzes alignment files generated following the assay and produces a file detailing peaks of significant enrichment.
gffcompare	bio	The program gffcompare can be used to compare, merge, annotate, and estimate accuracy of one or more GFF files (the 'query' files), when compared with a reference annotation (also provided as GFF).
gmap-gsnap	bio	GMAP: A Genomic Mapping and Alignment Program for mRNA and EST Sequences GSNAP: Genomic Short-read Nucleotide Alignment Program
gpunufft	bio	GPU Regridding of arbitrary 3-D/2-D MRI data
gsea	bio	Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes).
hic-pro	bio	HiC-Pro is an optimized and flexible pipeline for Hi-C data processing.
hisat2	bio	HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) against the general human population (as well as against a single reference genome).
hmmer	bio	HMMER is used for searching sequence databases for homologs of protein sequences, and for making protein sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs). Compared to BLAST, FASTA, and other sequence alignment and database search tools based on older scoring methodology, HMMER aims to be significantly more accurate and more able to detect remote homologs because of the strength of its underlying mathematical models. In the past, this strength came at significant computational expense, but in the new HMMER3 project, HMMER is now essentially as fast as BLAST.
homer	bio	HOMER (Hypergeometric Optimization of Motif EnRichment) is a suite of tools for Motif Discovery and ChIP-Seq analysis. It is a collection of command line programs for unix-style operating systems written in mostly perl and c++. Homer was primarily written as a de novo motif discovery algorithm that is well suited for finding 8-12 bp motifs in large scale genomics data.
htslib	bio	A C library for reading/writing high-throughput sequencing data. This package includes the utilities bgzip and tabix
igvtools	bio	This package contains command line utilities for preprocessing, computing feature count density (coverage), sorting, and indexing data files. See also http://www.broadinstitute.org/software/igv/igvtools_commandline.
impute2	bio	IMPUTE version 2 (also known as IMPUTE2) is a genotype imputation and haplotype phasing program based on ideas from Howie et al. 2009
io_lib	bio	Io_lib is a library of file reading and writing code to provide a general purpose trace file (and Experiment File) reading interface. The programmer simply calls the (eg) read_reading to create a "Read" C structure with the data loaded into memory. It has been compiled and tested on a variety of unix systems, MacOS X and MS Windows.
iqtree	bio	Efficient phylogenomic software by maximum likelihood
isoseqenv	bio	IsoDeq3 is a Scalable De Novo Isoform Discovery
jellyfish	bio	Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA.
juicebox	bio	Juicer is a one-click pipeline for processing terabase scale Hi-C datasets.
kallisto	bio	Kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment.
kent-tools	bio	A set of genome utilities developed at the University of California Santa Cruz.
kraken2	bio	Kraken is a system for assigning taxonomic labels to short DNA sequences, usually obtained through metagenomic studies. Previous attempts by other bioinformatics software to accomplish this task have often used sequence alignment or machine learning techniques that were quite slow, leading to the development of less sensitive but much faster abundance estimation programs. Kraken aims to achieve high sensitivity and high speed by utilizing exact alignments of k-mers and a novel classification algorithm.
libgtextutils	bio	ligtextutils is a dependency of fastx-toolkit and is provided via the same upstream
locuszoom	bio	LocusZoom Standalone is for the command line (standalone) version of LocusZoom, an application for creating regional plots from genome-wide association studies built in Python and R.
longranger	bio	Long Ranger is a set of analysis pipelines that processes Chromium sequencing output to align reads and call and phase SNPs, indels, and structural variants.
macs2	bio	With the improvement of sequencing techniques, chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq) is getting popular to study genome-wide protein-DNA interactions. To address the lack of powerful ChIP-Seq analysis method, we presented the Model-based Analysis of ChIP-Seq (MACS), for identifying transcript factor binding sites. MACS captures the influence of genome complexity to evaluate the significance of enriched ChIP regions and MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation.
maestro	bio	MAESTRO(Model-based AnalysEs of Single-cell Transcriptome and RegulOme) is a comprehensive single-cell RNA-seq and ATAC-seq analysis suit built using snakemake. MAESTRO combines several dozen tools and packages to create an integrative pipeline, which enables scRNA-seq and scATAC-seq analysis from raw sequencing data (fastq files) all the way through alignment, quality control, cell filtering, normalization, unsupervised clustering, differential expression and peak calling, celltype annotation and transcription regulation analysis.
mafft	bio	MAFFT is a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <∼200 sequences), FFT-NS-2 (fast; for alignment of <∼30,000 sequences), etc.
manta	bio	Manta calls structural variants (SVs) and indels from mapped paired-end sequencing reads. It is optimized for analysis of germline variation in small sets of individuals and somatic variation in tumor/normal sample pairs. Manta discovers, assembles and scores large-scale SVs, medium-sized indels and large insertions within a single efficient workflow.
marge	bio	MARGE is a robust methodology that leverages a comprehensive library of genome-wide H3K27ac ChIP-seq profiles to predict key regulated genes and cis-regulatory regions in human or mouse.
maxquant	bio	MaxQuant is a quantitative proteomics software package designed for analyzing large mass-spectrometric data sets. It is specifically aimed at high-resolution MS data. Several labeling techniques as well as label-free quantification are supported.
meme	bio	The MEME Suite allows you to: * discover motifs using MEME, DREME (DNA only) or GLAM2 on groups of related DNA or protein sequences, * search sequence databases with motifs using MAST, FIMO, MCAST or GLAM2SCAN, * compare a motif to all motifs in a database of motifs, * associate motifs with Gene Ontology terms via their putative target genes, and * analyse motif enrichment using SpaMo or CentriMo.
metamorpheus	bio	MetaMorpheus is a bottom-up proteomics database search software with integrated post-translational modification (PTM) discovery capability. This program combines features of Morpheus and G-PTM-D in a single tool.
minimap2	bio	Minimap2 is a fast sequence mapping and alignment program that can find overlaps between long noisy reads, or map long reads or their assemblies to a reference genome optionally with detailed alignment (i.e. CIGAR). At present, it works efficiently with query sequences from a few kilobases to ~100 megabases in length at an error rate ~15%. Minimap2 outputs in the PAF or the SAM format. On limited test data sets, minimap2 is over 20 times faster than most other long-read aligners. It will replace BWA-MEM for long reads and contig alignment.
mirdeep2	bio	miRDeep2 discovers active known or novel miRNAs from deep sequencing data (Solexa/Illumina, 454, ...).
mothur	bio	Mothur is a single piece of open-source, expandable software to fill the bioinformatics needs of the microbial ecology community.
mrtrix3	bio	MRtrix3 provides a set of tools to perform various types of diffusion MRI analyses, from various forms of tractography through to next-generation group-level analyses. It is designed with consistency, performance, and stability in mind, and is freely available under an open-source license. It is developed and maintained by a team of experts in the field, fostering an active community of users from diverse backgrounds.
mrtrix3tissue	bio	MRtrix3Tissue is a fork of the MRtrix3 project. It aims to add capabilities for 3-Tissue CSD modelling and analysis to a complete version of the MRtrix3 software.
multiqc	bio	MultiQC searches a given directory for analysis logs and compiles a HTML report. It's a general use tool, perfect for summarising the output from numerous bioinformatics tools.
mummer	bio	MUMmer is a system for rapidly aligning entire genomes, whether in complete or draft form. AMOS makes use of it.
muscle	bio	MUSCLE is one of the best-performing multiple alignment programs according to published benchmark tests, with accuracy and speed that are consistently better than CLUSTALW. MUSCLE can align hundreds of sequences in seconds. Most users learn everything they need to know about MUSCLE in a few minutes-only a handful of command-line options are needed to perform common alignment tasks.
mutect	bio	MuTect is a method developed at the Broad Institute for the reliable and accurate identification of somatic point mutations in next generation sequencing data of cancer genomes.
mutsigcv	bio	MutSig stands for "Mutation Significance". MutSig analyzes lists of mutations discovered in DNA sequencing, to identify genes that were mutated more often than expected by chance given background mutation processes.
nanopolish	bio	Software package for signal-level analysis of Oxford Nanopore sequencing data.
ncbi-vdb	bio	The SRA Toolkit and SDK from NCBI is a collection of tools and libraries for using data in the INSDC Sequence Read Archives.
neuron	bio	Empirically-based simulations of neurons and networks of neurons.
ngs	bio	NGS is a new, domain-specific API for accessing reads, alignments and pileups produced from Next Generation Sequencing.
ngsf	bio	ngsF is a program to estimate per-individual inbreeding coefficients under a probabilistic framework that takes the uncertainty of genotype's assignation into account. It avoids calling genotypes by using genotype likelihoods or posterior probabilities.
nibabies	bio	NiBabies is an open-source software pipeline designed to process anatomical and functional magnetic resonance imaging data. A member of the NeuroImaging PREProcessing toolS (NiPreps) family, NiBabies is designed and optimized for human infants between 0-2 years old.
nseg	bio	Nseg is used to identify low complexity sequencesi.
openms	bio	OpenMS is an open-source software C++ library for LC-MS data management and analyses. It offers an infrastructure for rapid development of mass spectrometry related software.
paintor	bio	PAINTOR is a statistical fine-mapping method that integrates functional genomic data with association strength from potentially multiple populations (or traits) to prioritize variants for follow-up analysis.
pasapipeline	bio	PASA, acronym for Program to Assemble Spliced Alignments, is a eukaryotic genome annotation tool that exploits spliced alignments of expressed transcript sequences to automatically model gene structures, and to maintain gene structure annotation consistent with the most recently available experimental sequence data. PASA also identifies and classifies all splicing variations supported by the transcript alignments.
pbwt	bio	The pbwt package provides a core implementation and development environment for PBWT (Positional Burrows-Wheeler Transform) methods for storing and computing on genome variation data sets.
peakseq	bio	PeakSeq is a program for identifying and ranking peak regions in ChIP-Seq experiments. It takes as input, mapped reads from a ChIP-Seq experiment, mapped reads from a control experiment and outputs a file with peak regions ranked with increasing Q-values.
peer	bio	PEER is a collection of Bayesian approaches to infer hidden determinants and their effects from gene expression profiles using factor analysis methods.
picard	bio	A set of tools (in Java) for working with next generation sequencing data in the BAM format.
plink	bio	PLINK is a free, open-source whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.
proteowiz	bio	ProteoWizard provides a set of open-source, cross-platform software libraries and tools (e.g. msconvert, Skyline, IDPicker, SeeMS) that facilitate proteomics data analysis. The libraries enable rapid tool creation by providing a robust, pluggable development framework that simplifies and unifies data file access, and performs standard chemistry and LCMS dataset computations.
psipred	bio	The PSIPRED Workbench provides a range of protein structure prediction methods.
psmc	bio	PSMC infers population size history from a diploid sequence using the Pairwise Sequentially Markovian Coalescent (PSMC) model.
qtltools	bio	QTLtools is a tool set for molecular QTL discovery and analysis. It allows to go from the raw sequence data to collection of molecular Quantitative Trait Loci (QTLs) in few easy-to-perform steps.
qualimap	bio	Qualimap 2 is a platform-independent application written in Java and R that provides both a Graphical User Inteface (GUI) and a command-line interface to facilitate the quality control of alignment sequencing data and its derivatives like feature counts.
rasqual	bio	RASQUAL (Robust Allele Specific QUAntification and quality controL) maps QTLs for sequenced based cellular traits by combining population and allele-specific signals.
raxml	bio	RAxML search algorithm for maximum likelihood based inference of phylogenetic trees.
rdp-classifier	bio	The RDP Classifier is a naive Bayesian classifier that can rapidly and accurately provides taxonomic assignments from domain to genus, with confidence estimates for each assignment.
regtools	bio	RegTools is a set of tools that integrate DNA-seq and RNA-seq data to help interpret mutations in a regulatory and splicing context.
relion	bio	RELION (for REgularised LIkelihood OptimisatioN, pronounce rely-on) is a stand-alone computer program that employs an empirical Bayesian approach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM).
relion-env	bio	RELION (for REgularised LIkelihood OptimisatioN, pronounce rely-on) is a stand-alone computer program that employs an empirical Bayesian approach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM).
repeatmasker	bio	RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.
rip-md	bio	RIP-MD allows to apply Residue Interaction Networks (RINs) to the analysis of molecular dynamics simulations of protein.
rmats-turbo	bio	rMATS turbo is the C/Cython version of rMATS (refer to http://rnaseq-mats.sourceforge.net). The major difference between rMATS turbo and rMATS is speed and space usage. rMATS turbo is 100 times faster and the output file is 1000 times smaller than rMATS. These advantages make analysis and storage of a large scale dataset easy and convenient.
rmblast	bio	RMBlast is a RepeatMasker compatible version of the standard NCBI BLAST suite. The primary difference between this distribution and the NCBI distribution is the addition of a new program 'rmblastn' for use with RepeatMasker and RepeatModeler.
rosetta	bio	The Rosetta software suite includes algorithms for computational modeling and analysis of protein structures. It has enabled notable scientific advances in computational biology, including de novo protein design, enzyme design, ligand docking, and structure prediction of biological macromolecules and macromolecular complexes.
rsem	bio	RNA-Seq by Expectation-Maximization
saint	bio	Significance Analysis of INTeractome (SAINT) consists of a series of software tools for assigning confidence scores to protein-protein interactions based on quantitative proteomics data in AP-MS experiments.
saintexpress	bio	Significance Analysis of INTeractome (SAINT) consists of a series of software tools for assigning confidence scores to protein-protein interactions based on quantitative proteomics data in AP-MS experiments.
salmon	bio	Salmon is a wicked-fast program to produce a highly-accurate, transcript-level quantification estimates from RNA-seq data.
sambamba	bio	Sambamba is a tool for processing BAM files.
samtools	bio	SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.
seacr	bio	SEACR is intended to call peaks and enriched regions from sparse CUT&RUN or chromatin profiling data in which background is dominated by zeroes (i.e. regions with no read coverage).
seqoutbias	bio	Molecular biology enzymes have nucleic acid preferences for their substrates; the preference of an enzyme is typically dictated by the sequence at or near the active site of the enzyme. This bias may result in spurious read count patterns when used to interpret high-resolution molecular genomics data. The seqOutBias program aims to correct this issue by scaling the aligned read counts by the ratio of genome-wide observed read counts to the expected sequence based counts for each k-mer.
sga	bio	SGA is a de novo genome assembler based on the concept of string graphs. The major goal of SGA is to be very memory efficient, which is achieved by using a compressed representation of DNA sequence reads.
shapeit4	bio	SHAPEIT4 is a fast and accurate method for estimation of haplotypes (aka phasing) for SNP array and high coverage sequencing data. The version 4 is a refactored and improved version of the SHAPEIT algorithm.
slim	bio	SLiM is an evolutionary simulation package that provides facilities for very easily and quickly constructing genetically explicit individual-based evolutionary models.
smrtlink	bio	PacBio’s open-source SMRT Analysis software suite is designed for use with Single Molecule, Real-Time (SMRT) Sequencing data. You can analyze, visualize, and manage your data through an intuitive GUI or command-line interface. You can also integrate SMRT Analysis in your existing data workflow through the extensive set of APIs provided
sortmerna	bio	SortMeRNA is a biological sequence analysis tool for filtering, mapping and OTU-picking NGS reads.
spaceranger	bio	A set of analysis piplines that perform sample demultiplexing, barcode processing, and single cell 3' gene counting.
spades	bio	SPAdes - St. Petersburg genome assembler - is an assembly toolkit containing various assembly pipelines.
sparc	bio	Sparc: a sparsity-based consensus algorithm for long erroneous sequencing reads
sparseassembler	bio	A sparse graph approach to de novo genome assembly
sratoolkit	bio	The SRA Toolkit, and the source-code SRA System Development Kit (SDK), will allow you to programmatically access data housed within SRA and convert it from the SRA format
stacks	bio	Stacks is a software pipeline for building loci from short-read sequences, such as those generated on the Illumina platform. Stacks was developed to work with restriction enzyme-based data, such as RAD-seq, for the purpose of building genetic maps and conducting population genomics and phylogeography.
star	bio	STAR aligns RNA-seq reads to a reference genome using uncompressed suffix arrays.
stringtie	bio	StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts.
thermorawfileparser	bio	Wrapper around the .net (C#) ThermoFisher ThermoRawFileReader library for running on Linux with mono (works on Windows too).
tophat	bio	TopHat is a fast splice junction mapper for RNA-Seq reads.
torus	bio	TORUS - QTL Discovery utilizing Genomic Annotations is a free software package that implements a computational procedure for discovering molecular QTLs incorporating genomic annotations.
trf	bio	Tandem Repeats Finder: a program to analyze DNA sequences.
trimgalore	bio	Trim Galore is a wrapper around Cutadapt and FastQC to consistently apply adapter and quality trimming to FastQ files, with extra functionality for RRBS data.
trimmomatic	bio	Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.
trinity	bio	Trinity represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-Seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-Seq reads.
varscan	bio	VarScan - Variant calling and somatic mutation/CNV detection for next-generation sequencing data
vcell	bio	VCell (Virtual Cell) is a comprehensive platform for modeling cell biological systems that is built on a central database and disseminated as a web application.
vcftools	bio	The aim of VCFtools is to provide easily accessible methods for working with complex genetic variation data in the form of VCF files.
vg	bio	Variation graphs provide a succinct encoding of the sequences of many genomes.
viennarna	bio	The Vienna RNA Package consists of a C code library and several stand-alone programs for the prediction and comparison of RNA secondary structures.
vsearch	bio	VSEARCH which supports de novo and reference based chimera detection, clustering, full-length and prefix dereplication, rereplication, reverse complementation, masking, all-vs-all pairwise global alignment, exact and global alignment searching, shuffling, subsampling and sorting. It also supports FASTQ file analysis, filtering, conversion and merging of paired-end reads.
wasp	bio	WASP is a suite of tools for unbiased allele-specific read mapping and discovery of molecular QTLs.
wigtobigwig	bio	The bigWig format is useful for dense, continuous data that will be displayed in the Genome Browser as a graph. BigWig files are created from wiggle (wig) type files using the program wigToBigWig.

Updated June 23, 2019 | HPC, software, bioinformatics bio, bioinformatics, computational-biology, docking, rosetta

« Return to HPC Overview

Bioinformatics and UVA HPC

Overview

Software Availability

Reference Genomes

Full List of Bioinformatics Software Modules